Re: Parsing Large Text Files
Re: Parsing Large Text Files
- Subject: Re: Parsing Large Text Files
- From: "Mark J. Reed" <email@hidden>
- Date: Thu, 1 May 2008 22:31:03 -0400
If my assumptions are correct, this Perl script should do the trick:
#!/usr/bin/perl
$/ = "\r\n>";
while (<>)
{
s/^>?(.*?)\r\n(.*)\r\n>/$2/oms;
my $name = $1;
s/\r\n//ogms;
$_ = reverse($_);
s/.{50}/$&\r\n/ogms;
print ">$name reversed:\r\n$_\r\n";
}
Save it as e.g. proteins.pl and run it thus:
$ perl proteins.pl inputfile >outputfile
OMM it takes about 12 seconds to process a 98,172,928-byte file I
created by repeating your snippet. Sample output:
>2007006285 Uroporphyrinogen-III decarboxylase [LWCv2] reversed:
AHNAYTEFSLNYMQKIVNGLIKTNDGRKVPSIVFTKGGGLWLEAQAEIGS
DALGLDWTVDIGSARKRVGDKVALQGNLDPAILLSTPEAIEKEVISVLAS
YGKGDGHVFNLGHGITQWTPPENAAAMLTAIRAHSQQYHV
>2007006286 Acyl-CoA synthetases (AMP-forming)/AMP-acid ligas es II [LWCv2] reve
rsed:
VVEGQAQDWTTLCRTIDARAAAFRQQGLAAGDCVALRGRNSVELVLAYLA
ALQLGARVLPLNPQLPDAQLQPLLPALDIDWGWSEAGDHWPGPVRPLTSD
VAVATPVPTNPAVTWQPGAPATLTLTSGSSGLPKGVLHCAANHLASAAGL
LAALPFTAGDGWLLSLPLFHVSGQGIVWRWLLRGARLLLVAEGDLAQALA
GCSHASLVPTQLQRLLAQNASLPALQHVLLGGAAIPVALTQRAEQAGIHC
WCGYRLTEMASTVTAK
>2007006287 [LWCv2] reversed:
DPAQAHALQRLEVRAWGCLLEELLACCPPDADTALAPLAALARACQQEEV
GARPLFDEIEQRLRTLAGDL
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden