• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: How to match data in two different text files as fast as possible?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to match data in two different text files as fast as possible?


  • Subject: Re: How to match data in two different text files as fast as possible?
  • From: Emmanuel <email@hidden>
  • Date: Sat, 12 Aug 2006 18:08:40 +0200

At 8:38 AM +0200 8/12/06, Richard Rönnbäck wrote (well, I edited a bit):
 FilePaths  someData  someData2     someData3   moreColumns

FilePaths   UniqueID

When all lines are processed the result should be written to a
file, so that I end up with:

FilePaths  someData  someData2     someData2   moreColumns  UniqueID

For lines that cannot be matched, the UniqeID field should either be empty
(just a tab stop) or better yet, say something like N/A.

I think that what costs time is to search for FilePaths in the second file, particularly when it's not there.


So, what I would do would be to turn that second file into a p-list once for all: then the lookups are ultra-fast, even for large sizes.

So, you read the file line by line, and for each line you do:

set {FilePaths, UniqueID} to find text "^([^\\t]+)\\t(.*)$" in theLine with regexp and string result using {"\\1", "\\2"}
PlistSet thePlist key FilePaths to UniqueID


where you have first initialized thePlist to PlistNew.

Then when you parse the first file line by line, you can search rapidly the p-list with:

try
	set UniqueID to PlistGet thePlist key FilePaths
on error
	set UniqueID to "N/A"
end

where you have set FilePaths with a similar (simpler) regular expression:

set FilePaths to find text "^[^\\t]+" in theLine with regexp and string result

This uses Satimage.osax and XMLLib.osax. If you expect Unicode, then instead of "find text" you have to use "ufind text", which is a command in Smile.

Emmanuel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >How to match data in two different text files as fast as possible? (From: Richard Rönnbäck <email@hidden>)

  • Prev by Date: Re: Problem with curl
  • Next by Date: How do I return full numbers in a user in put list?
  • Previous by thread: Re: How to match data in two different text files as fast as possible?
  • Next by thread: Re: How to 'wait' for an entourage schedule event to complete?
  • Index(es):
    • Date
    • Thread