Re: How to match data in two different text files as fast as possible?
Re: How to match data in two different text files as fast as possible?
- Subject: Re: How to match data in two different text files as fast as possible?
- From: Emmanuel <email@hidden>
- Date: Sat, 12 Aug 2006 18:08:40 +0200
At 8:38 AM +0200 8/12/06, Richard Rönnbäck wrote (well, I edited a bit):
FilePaths someData someData2 someData3 moreColumns
FilePaths UniqueID
When all lines are processed the result should be written to a
file, so that I end up with:
FilePaths someData someData2 someData2 moreColumns UniqueID
For lines that cannot be matched, the UniqeID field should either be empty
(just a tab stop) or better yet, say something like N/A.
I think that what costs time is to search for FilePaths in the second
file, particularly when it's not there.
So, what I would do would be to turn that second file into a p-list
once for all: then the lookups are ultra-fast, even for large sizes.
So, you read the file line by line, and for each line you do:
set {FilePaths, UniqueID} to find text "^([^\\t]+)\\t(.*)$" in
theLine with regexp and string result using {"\\1", "\\2"}
PlistSet thePlist key FilePaths to UniqueID
where you have first initialized thePlist to PlistNew.
Then when you parse the first file line by line, you can search
rapidly the p-list with:
try
set UniqueID to PlistGet thePlist key FilePaths
on error
set UniqueID to "N/A"
end
where you have set FilePaths with a similar (simpler) regular expression:
set FilePaths to find text "^[^\\t]+" in theLine with regexp and string result
This uses Satimage.osax and XMLLib.osax. If you expect Unicode, then
instead of "find text" you have to use "ufind text", which is a
command in Smile.
Emmanuel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden