Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Working with big lists

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Working with big lists

Subject: Re: Working with big lists
From: has <email@hidden>
Date: Sun, 22 May 2005 21:40:15 +0100

Rob Stott wrote:

>I have a list in a text (.txt) file. I want to find out how many
>times each line occurs in the text file.
[...]
>I was wondering whether anyone had any clever ideas for doing
>this a bit more quickly.

Use a more efficient algorithm. You should only need to scan each line in the file once, whereas your current routine scans each line N times (where N is the number of lines in the file) - horribly inefficient.

The trick is to read the file one line at a time, using a dictionary object (aka 'hash' or 'associative array') to keep count of the number of times a given string is found. Example:

#!/usr/bin/python

f = file('/Users/has/test.txt') # [your path here]
d = {}
line = True
while line:
	line = f.readline()
	s = line.rstrip('\r\n')
	d[s] = d.get(s, 0) + 1
f.close()

lst = d.items()
lst.sort(lambda a, b: cmp(b[1], a[1]))
print '\n'.join(['%s\t%s' % (a, b) for a, b in lst])

HTH

has
--
http://freespace.virgin.net/hamish.sanderson/
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: Working with big lists
  - From: Rob Stott <email@hidden>

Prev by Date: Re: Working with big lists
Next by Date: Font Book Examples
Previous by thread: Re: Getting Next and Previous Messages in Apple Mail
Next by thread: Re: Working with big lists
Index(es):
- Date
- Thread