Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Parsing Large Text Files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Large Text Files

Subject: Re: Parsing Large Text Files
From: "Mark J. Reed" <email@hidden>
Date: Thu, 1 May 2008 23:44:50 -0400

On Thu, May 1, 2008 at 11:38 PM, Bruce Robertson <email@hidden> wrote:
>  The header line should not be changed or broken at 50. Only the following
>  lines should be changed.
>
>  Given:
>
>
>  >2007006285 Uroporphyrinogen-III decarboxylase [LWCv2]
>  VHYQQSHARIATLMAAANEPPTWQTIGHGLNFVHGDGKGYSALVSIVEKE
>  IAEPTSLLIAPDLNGQLAVKDGVRKRASGIDVTWDLGLADSGIEAQAELW
>  LGGGKTFVISPVKRGDNTKILGNVIKQMYNLSFETYANHA
>
>  You should get:
>
>
>  >2007006285 Uroporphyrinogen-III decarboxylase [LWCv2]
>  AHNAYTEFSLNYMQKIVNGLIKTNDGRKVPSIVFTKGGGLWLEAQAEIGS
>  SDALGLDWTVDIGSARKRVGDKVALQGNLDPAILLSTPEAIEKEVISVLA
>  ASYGKGDGHVFNLGHGITQWTPPENAAAMLTAIRAHSQQYHV

OK.  That's what my script as posted outputs, except for adding the
text "Reversed" to the first


>  But yours gives:
>
>  VVEGQAQDWTTLCRTI
>  DARAAAFRQQGLAAGDCVALRGRNSVELVLAY
>
>  LAALQLGARVLPLNPQLP
>  DAQLQPLLPALDIDWGWSEAGDHWPGPVRPL
>
>  TSDVAVATPVPTNPAVTWQ
>  PGAPATLTLTSGSSGLPKGVLHCAANHLAS
>
>  AAGLLAALPFTAGDGWLLSL
>  PLFHVSGQGIVWRWLLRGARLLLVAEGDL

Not even close to what I get.  How are you running it?

To avoid any copy/paste issues, I've attached the script file to this
email message, modified to remove the "reversed" tag,  along with
sample input and output (just your original snippet).  The attachment
probably won't make it to the list, just to you.




>
>  AQALAGCSHASLVPTQLQRLL
>  AQNASLPALQHVLLGGAAIPVALTQRAE
>
>  QAGIHCWCGYRLTEMASTVTAK
>  ]2vCWL[ II se sagil dica-PM
>
>  A/)gnimrof-PMA( sesatehtnys AoC-lycA 6826007002>
>  A
>
>  HNAYTEFSLNYMQKIVNGLIKTNDGRKVPSIVFTKGGGL
>  WLEAQAEIGS
>
>  DALGLDWTVDIGSARKRVGDKVALQGNLDPAILLSTPEAI
>  EKEVISVLA
>
>  SYGKGDGHVFNLGHGITQWTPPENAAAMLTAIRAHSQQYHV
>  ]2vCWL[
>
>
>
>  > Ok, so my small correction broke the output by leaving '>'s in.
>  > Here's a no-really-this-time corrected version, with commentary added
>  > so you can follow the logic.
>  >
>  > #!/usr/bin/perl
>  >
>  > # Read one whole protein at a time: instead of reading one line,
>  > # keep reading until there's a CRLF followed by a '>'
>  > $/ = "\r\n>";
>  >
>  > # Repeat while there's input remaining
>  > while (<>)
>  > {
>  >   # chop off initial > if any (only happens on first line)
>  >   s/^>//o;
>  >
>  >   # chop off final > if any (all but last line)
>  >   s/>$//o;
>  >
>  >   # strip off the first line (name of protein) so it doesn't get
>  >   # included in the reversal
>  >   s/^(.*?)\r\n(.*)$/$2/os;
>  >
>  >   # but remember that name for later
>  >   my $name = $1;
>  >
>  >   # get rid of all CRLF's
>  >   s/\r\n//og;
>  >
>  >   # reverse what's left
>  >   $_ = reverse($_);
>  >
>  >   # put CRLF's back in every 50 characters
>  >   s/.{50}/$&\r\n/og;
>  >
>  >   # and output, with name
>  >   print ">$name reversed:\r\n$_\r\n";
>  > }
>  >
>
>



--
Mark J. Reed <email@hidden>

Attachment: proteins.zip
Description: Zip archive

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

Follow-Ups:
- Re: Parsing Large Text Files
  - From: Bruce Robertson <email@hidden>

References:
	>Re: Parsing Large Text Files (From: "Mark J. Reed" <email@hidden>)
	>Re: Parsing Large Text Files (From: Bruce Robertson <email@hidden>)

Prev by Date: Re: Parsing Large Text Files
Next by Date: Re: Parsing Large Text Files
Previous by thread: Re: Parsing Large Text Files
Next by thread: Re: Parsing Large Text Files
Index(es):
- Date
- Thread