Re: How to match data in two different text files as fast as possible?
Re: How to match data in two different text files as fast as possible?
- Subject: Re: How to match data in two different text files as fast as possible?
- From: has <email@hidden>
- Date: Sun, 13 Aug 2006 01:23:52 +0100
Richard Rönnbäck wrote:
I need to match data from two different text files, but none of the
techniques I know are fast enough,
AppleScript's not the best language for text crunching, and if speed
is an issue you'd be better off using Perl, Python, etc. e.g. Below's
a Python version, which'll hopefully be fast enough. (Perl'd be
quicker still, but my Perl's rubbish.:p)
#!/usr/bin/python
import re, sys
maintablepath, idtablepath, outtablepath = sys.argv[1:]
# make a lookup table for unique ids by file path
idtablepattern = re.compile('^(.+?)\t(.+?)$', re.MULTILINE)
idtable = dict(idtablepattern.findall(file(idtablepath).read()))
# write each line in the main table to the out table, appending
either unique id or 'N/A'
infile = file(maintablepath)
outfile = file(outtablepath, 'w')
line = infile.readline()
while line:
path = line.split('\t', 1)[0]
outfile.write('%s\t%s\n' % (line.rstrip('\n\r'), idtable.get(path,
'N/A')))
line = infile.readline()
Save the above script as 'append.py', then run from the command line
using:
python /path/to/append.py /path/to/maintable.txt /path/to/
idtable.txt /path/to/outtable.txt
The above script assumes that both files use the same text encoding,
and if the paths are Unicode that they're decomposed in the same way.
Also, path comparisons are case-sensitive. If these are assumptions
are unsafe, it's not hard to modify the script to suit but you'll
need to specify your requirements in more detail.
HTH
has
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden