Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Parsing Large Text Files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Large Text Files

Subject: Re: Parsing Large Text Files
From: "Mark J. Reed" <email@hidden>
Date: Fri, 2 May 2008 00:36:08 -0400

On Fri, May 2, 2008 at 12:26 AM, Bruce Robertson <email@hidden> wrote:
> > The lines in your input file do not end in CRLF's. Just bare CR's.
>
>  Yup, that did it, thanks.

If the data files actually have CR's instead of CRLF's, here's a
version of the Perl script that will work on them that way without any
need to change the files.  It's not quite as simple as removing all
the \n's because the "strip off the first line" regex takes advantage
of the fact that '.' never matches '\n'.  Since '.' *does* match '\r',
it will take the whole protein entry for the name without some
greediness adjustment.

#!/usr/bin/perl

# Read one whole protein at a time: instead of reading one line,
# keep reading until there's a CRLF followed by a '>'
$/ = "\r>";

# Repeat while there's input remaining
while (<>)
{
    # chop off initial > if any (only happens on first line)
    s/^>//o;

    # chop off final > if any (all but last line))
    s/>$//o;

    # strip off the first line (name of protein) so it doesn't get
    # included in the reversal
    s/^(.*?)\r//o;

    # but remember the name for later
    my $name = $1;

    # get rid of all CR's
    s/\r//og;

    # reverse it
    $_ = reverse($_);

    # put CR's back in every 50 characters
    s/.{50}/$&\r/og;

    # and output, with name
    print ">$name\r$_\r";
}




--
Mark J. Reed <email@hidden>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

References:
	>Re: Parsing Large Text Files (From: "Mark J. Reed" <email@hidden>)
	>Re: Parsing Large Text Files (From: Bruce Robertson <email@hidden>)

Prev by Date: Re: Parsing Large Text Files
Next by Date: Re: Parsing Large Text Files
Previous by thread: Re: Parsing Large Text Files
Next by thread: Re: Parsing Large Text Files
Index(es):
- Date
- Thread