Re: Parsing Large Text Files
Re: Parsing Large Text Files
- Subject: Re: Parsing Large Text Files
- From: Bruce Robertson <email@hidden>
- Date: Fri, 02 May 2008 10:55:26 -0700
Thanks.
Your version isn't quite correct - lt leaves out the title line; and it is
quite slow, processing at about 2MB/minute.
My recent applescript version is about 10X as fast, about 22MB/minute.
Also, for some reason the font size of your post is enormous.
>
>> Yes, that't nice, gets down to about 22MB/minute.
>>
>> The perl script processes at about 800MB/minute by my rough test.
>>
>
> Here's another pure applescript version, probably not as fast as either of
> those, but what the heck, is speed the only consideration?
>
> Will you ever need to repurpose or tweak the script? If so do you have time to
> master another language or do you want to depend on the kindness of strangers?
>
> ES
>
> set StartTics to the ticks
> set AppleScript's text item delimiters to ">"
> set proteinFile to alias "Macintosh
> HD:Users:edstockly:Desktop:Archive:parseme2.txt"
> set readInfo to read proteinFile
>
>
>
> set allProteinInfo to text items of readInfo
> set newData to {}
> set AppleScript's text item delimiters to ""
> repeat with thisProteinInfo in the rest of allProteinInfo
> set newInfo to FixProtein(thisProteinInfo)
> set the end of newData to the newInfo
> end repeat
> set AppleScript's text item delimiters to return
> set newData to newData as text
> set resultFile to ((path to desktop as Unicode text) & "esPro.txt")
> try
>
> set finalFile to (open for access file resultFile with write permission)
> on error
> close access resultFile
> set finalFile to (open for access file resultFile with write permission)
> end try
> set eof of finalFile to 0
> write newData to finalFile
> close access finalFile
> set endTicks to the ticks
> tell application "TextEdit"
> activate (open file resultFile)
> end tell
> return the endTicks - StartTics
> on FixProtein(thisInfo)
> set thisProteinInfo to paragraphs of thisInfo
> set thisProteinInfo to the rest of thisProteinInfo as text
> set thisProteinInfo to the reverse of every item of thisProteinInfo
> set stringSize to count of thisProteinInfo
> set segEnd to stringSize
> set segStart to 1
> set newString to {}
> repeat
> if stringSize < 50 then
> set the end of newString to items segStart thru segEnd of
> thisProteinInfo
> exit repeat
> else
> set the end of newString to items segStart thru (segStart + 54) of
> thisProteinInfo
> set the end of newString to return
> set segStart to segStart + 50
> set stringSize to stringSize - 50
> end if
> end repeat
> set AppleScript's text item delimiters to ""
> return thisProteinInfo as string
>
> end FixProtein
>
> =
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden