Re: How to match data in two different text files as fast as possible?
Re: How to match data in two different text files as fast as possible?
- Subject: Re: How to match data in two different text files as fast as possible?
- From: Richard Rönnbäck <email@hidden>
- Date: Tue, 15 Aug 2006 19:08:37 +0200
- Thread-topic: How to match data in two different text files as fast as possible?
Thanks guys!
You have some very nice suggestions (as usual one is tempted to say).
Emmanuel has even before this showed me the incredible usefulness of
Satimage. I had no idea that plist are that fast though
Has, I am very much looking forward to trying out your python script (which
I unfortunately haven't been able to do yet)
I have for quite some time being tempted to learn more about python. Do you
know of any good book on Python? (i prefer that to the web when learning
something from scratch)
Thanks again!
/ Richard
> Från: has <email@hidden>
> Datum: Sun, 13 Aug 2006 01:23:52 +0100
> Till: <email@hidden>
> Ämne: Re: How to match data in two different text files as fast as possible?
>
> Richard Rönnbäck wrote:
>
>> I need to match data from two different text files, but none of the
>> techniques I know are fast enough,
>
> AppleScript's not the best language for text crunching, and if speed
> is an issue you'd be better off using Perl, Python, etc. e.g. Below's
> a Python version, which'll hopefully be fast enough. (Perl'd be
> quicker still, but my Perl's rubbish.:p)
>
> #!/usr/bin/python
>
> import re, sys
>
> maintablepath, idtablepath, outtablepath = sys.argv[1:]
>
> # make a lookup table for unique ids by file path
> idtablepattern = re.compile('^(.+?)\t(.+?)$', re.MULTILINE)
>
> idtable = dict(idtablepattern.findall(file(idtablepath).read()))
>
> # write each line in the main table to the out table, appending
> either unique id or 'N/A'
> infile = file(maintablepath)
> outfile = file(outtablepath, 'w')
>
> line = infile.readline()
> while line:
> path = line.split('\t', 1)[0]
> outfile.write('%s\t%s\n' % (line.rstrip('\n\r'), idtable.get(path,
> 'N/A')))
> line = infile.readline()
>
>
> Save the above script as 'append.py', then run from the command line
> using:
>
> python /path/to/append.py /path/to/maintable.txt /path/to/
> idtable.txt /path/to/outtable.txt
>
> The above script assumes that both files use the same text encoding,
> and if the paths are Unicode that they're decomposed in the same way.
> Also, path comparisons are case-sensitive. If these are assumptions
> are unsafe, it's not hard to modify the script to suit but you'll
> need to specify your requirements in more detail.
>
> HTH
>
> has
> --
> http://freespace.virgin.net/hamish.sanderson/
>
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Applescript-users mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> edband.net
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden