Re: Parsing Large Text Files
Re: Parsing Large Text Files
- Subject: Re: Parsing Large Text Files
- From: "Mark J. Reed" <email@hidden>
- Date: Fri, 2 May 2008 00:36:08 -0400
On Fri, May 2, 2008 at 12:26 AM, Bruce Robertson <email@hidden> wrote:
> > The lines in your input file do not end in CRLF's. Just bare CR's.
>
> Yup, that did it, thanks.
If the data files actually have CR's instead of CRLF's, here's a
version of the Perl script that will work on them that way without any
need to change the files. It's not quite as simple as removing all
the \n's because the "strip off the first line" regex takes advantage
of the fact that '.' never matches '\n'. Since '.' *does* match '\r',
it will take the whole protein entry for the name without some
greediness adjustment.
#!/usr/bin/perl
# Read one whole protein at a time: instead of reading one line,
# keep reading until there's a CRLF followed by a '>'
$/ = "\r>";
# Repeat while there's input remaining
while (<>)
{
# chop off initial > if any (only happens on first line)
s/^>//o;
# chop off final > if any (all but last line))
s/>$//o;
# strip off the first line (name of protein) so it doesn't get
# included in the reversal
s/^(.*?)\r//o;
# but remember the name for later
my $name = $1;
# get rid of all CR's
s/\r//og;
# reverse it
$_ = reverse($_);
# put CR's back in every 50 characters
s/.{50}/$&\r/og;
# and output, with name
print ">$name\r$_\r";
}
--
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden