Re: Parsing Large Text Files
Re: Parsing Large Text Files
- Subject: Re: Parsing Large Text Files
- From: Bruce Robertson <email@hidden>
- Date: Thu, 01 May 2008 20:38:16 -0700
Well thanks but that's not even close.
The header line should not be changed or broken at 50. Only the following
lines should be changed.
Given:
>2007006285 Uroporphyrinogen-III decarboxylase [LWCv2]
VHYQQSHARIATLMAAANEPPTWQTIGHGLNFVHGDGKGYSALVSIVEKE
IAEPTSLLIAPDLNGQLAVKDGVRKRASGIDVTWDLGLADSGIEAQAELW
LGGGKTFVISPVKRGDNTKILGNVIKQMYNLSFETYANHA
You should get:
>2007006285 Uroporphyrinogen-III decarboxylase [LWCv2]
AHNAYTEFSLNYMQKIVNGLIKTNDGRKVPSIVFTKGGGLWLEAQAEIGS
SDALGLDWTVDIGSARKRVGDKVALQGNLDPAILLSTPEAIEKEVISVLA
ASYGKGDGHVFNLGHGITQWTPPENAAAMLTAIRAHSQQYHV
But yours gives:
VVEGQAQDWTTLCRTI
DARAAAFRQQGLAAGDCVALRGRNSVELVLAY
LAALQLGARVLPLNPQLP
DAQLQPLLPALDIDWGWSEAGDHWPGPVRPL
TSDVAVATPVPTNPAVTWQ
PGAPATLTLTSGSSGLPKGVLHCAANHLAS
AAGLLAALPFTAGDGWLLSL
PLFHVSGQGIVWRWLLRGARLLLVAEGDL
AQALAGCSHASLVPTQLQRLL
AQNASLPALQHVLLGGAAIPVALTQRAE
QAGIHCWCGYRLTEMASTVTAK
]2vCWL[ II se sagil dica-PM
A/)gnimrof-PMA( sesatehtnys AoC-lycA 6826007002>
A
HNAYTEFSLNYMQKIVNGLIKTNDGRKVPSIVFTKGGGL
WLEAQAEIGS
DALGLDWTVDIGSARKRVGDKVALQGNLDPAILLSTPEAI
EKEVISVLA
SYGKGDGHVFNLGHGITQWTPPENAAAMLTAIRAHSQQYHV
]2vCWL[
> Ok, so my small correction broke the output by leaving '>'s in.
> Here's a no-really-this-time corrected version, with commentary added
> so you can follow the logic.
>
> #!/usr/bin/perl
>
> # Read one whole protein at a time: instead of reading one line,
> # keep reading until there's a CRLF followed by a '>'
> $/ = "\r\n>";
>
> # Repeat while there's input remaining
> while (<>)
> {
> # chop off initial > if any (only happens on first line)
> s/^>//o;
>
> # chop off final > if any (all but last line)
> s/>$//o;
>
> # strip off the first line (name of protein) so it doesn't get
> # included in the reversal
> s/^(.*?)\r\n(.*)$/$2/os;
>
> # but remember that name for later
> my $name = $1;
>
> # get rid of all CRLF's
> s/\r\n//og;
>
> # reverse what's left
> $_ = reverse($_);
>
> # put CRLF's back in every 50 characters
> s/.{50}/$&\r\n/og;
>
> # and output, with name
> print ">$name reversed:\r\n$_\r\n";
> }
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden