Parsing Large Text Files
Parsing Large Text Files
- Subject: Parsing Large Text Files
- From: Bruce Robertson <email@hidden>
- Date: Thu, 01 May 2008 15:54:07 -0700
Posting for a friend, who needs to parse 80MB protein data files.
Here is his description of the process.
Below is a snippet of a typical 80 MB text file that serves as my starting
point.
This snippet shows 3 proteins (out of 321,000). The first line of each
protein contains the protein name and a short description. The following
lines (each of which ends with a CRLF) represent the sequence of amino acids
that make up the protein.
To reverse a sequence, I must first concatenate the variable number of lines
of the sequence, stripping out the CRLFs, then reverse the order of the
characters in the sequence, and insert CRLFs every 50 characters.
>2007006285 Uroporphyrinogen-III decarboxylase [LWCv2]
VHYQQSHARIATLMAAANEPPTWQTIGHGLNFVHGDGKGYSALVSIVEKE
IAEPTSLLIAPDLNGQLAVKDGVRKRASGIDVTWDLGLADSGIEAQAELW
LGGGKTFVISPVKRGDNTKILGNVIKQMYNLSFETYANHA
>2007006286 Acyl-CoA synthetases (AMP-forming)/AMP-acid ligas es II [LWCv2]
KATVTSAMETLRYGCWCHIGAQEARQTLAVPIAAGGLLVHQLAPLSANQA
LLRQLQTPVLSAHSCGALAQALDGEAVLLLRAGRLLWRWVIGQGSVHFLP
LSLLWGDGATFPLAALLGAASALHNAACHLVGKPLGSSGSTLTLTAPAGP
QWTVAPNTPVPTAVAVDSTLPRVPGPWHDGAESWGWDIDLAPLLPQLQAD
PLQPNLPLVRAGLQLAALYALVLEVSNRGRLAVCDGAALGQQRFAAARAD
ITRCLTTWDQAQGEVV
>2007006287 [LWCv2]
LDGALTRLRQEIEDFLPRAGVEEQQCARALAALPALATDADPPCCALLEE
LLCGWARVELRQLAHAQAPD
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden