• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Parsing Large Text Files
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Large Text Files


  • Subject: Re: Parsing Large Text Files
  • From: Bruce Robertson <email@hidden>
  • Date: Thu, 01 May 2008 20:38:16 -0700

Well thanks but that's not even close.

The header line should not be changed or broken at 50. Only the following
lines should be changed.

Given:

>2007006285 Uroporphyrinogen-III decarboxylase [LWCv2]
VHYQQSHARIATLMAAANEPPTWQTIGHGLNFVHGDGKGYSALVSIVEKE
IAEPTSLLIAPDLNGQLAVKDGVRKRASGIDVTWDLGLADSGIEAQAELW
LGGGKTFVISPVKRGDNTKILGNVIKQMYNLSFETYANHA

You should get:

>2007006285 Uroporphyrinogen-III decarboxylase [LWCv2]
AHNAYTEFSLNYMQKIVNGLIKTNDGRKVPSIVFTKGGGLWLEAQAEIGS
SDALGLDWTVDIGSARKRVGDKVALQGNLDPAILLSTPEAIEKEVISVLA
ASYGKGDGHVFNLGHGITQWTPPENAAAMLTAIRAHSQQYHV

But yours gives:

VVEGQAQDWTTLCRTI
DARAAAFRQQGLAAGDCVALRGRNSVELVLAY

LAALQLGARVLPLNPQLP
DAQLQPLLPALDIDWGWSEAGDHWPGPVRPL

TSDVAVATPVPTNPAVTWQ
PGAPATLTLTSGSSGLPKGVLHCAANHLAS

AAGLLAALPFTAGDGWLLSL
PLFHVSGQGIVWRWLLRGARLLLVAEGDL

AQALAGCSHASLVPTQLQRLL
AQNASLPALQHVLLGGAAIPVALTQRAE

QAGIHCWCGYRLTEMASTVTAK
]2vCWL[ II se sagil dica-PM

A/)gnimrof-PMA( sesatehtnys AoC-lycA 6826007002>
A

HNAYTEFSLNYMQKIVNGLIKTNDGRKVPSIVFTKGGGL
WLEAQAEIGS

DALGLDWTVDIGSARKRVGDKVALQGNLDPAILLSTPEAI
EKEVISVLA

SYGKGDGHVFNLGHGITQWTPPENAAAMLTAIRAHSQQYHV
]2vCWL[

> Ok, so my small correction broke the output by leaving '>'s in.
> Here's a no-really-this-time corrected version, with commentary added
> so you can follow the logic.
>
> #!/usr/bin/perl
>
> # Read one whole protein at a time: instead of reading one line,
> # keep reading until there's a CRLF followed by a '>'
> $/ = "\r\n>";
>
> # Repeat while there's input remaining
> while (<>)
> {
>   # chop off initial > if any (only happens on first line)
>   s/^>//o;
>
>   # chop off final > if any (all but last line)
>   s/>$//o;
>
>   # strip off the first line (name of protein) so it doesn't get
>   # included in the reversal
>   s/^(.*?)\r\n(.*)$/$2/os;
>
>   # but remember that name for later
>   my $name = $1;
>
>   # get rid of all CRLF's
>   s/\r\n//og;
>
>   # reverse what's left
>   $_ = reverse($_);
>
>   # put CRLF's back in every 50 characters
>   s/.{50}/$&\r\n/og;
>
>   # and output, with name
>   print ">$name reversed:\r\n$_\r\n";
> }
>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: Parsing Large Text Files
      • From: "Mark J. Reed" <email@hidden>
References: 
 >Re: Parsing Large Text Files (From: "Mark J. Reed" <email@hidden>)

  • Prev by Date: Re: Parsing Large Text Files
  • Next by Date: Re: Parsing Large Text Files
  • Previous by thread: Re: Parsing Large Text Files
  • Next by thread: Re: Parsing Large Text Files
  • Index(es):
    • Date
    • Thread