• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Working with big lists
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Working with big lists


  • Subject: Re: Working with big lists
  • From: has <email@hidden>
  • Date: Sun, 22 May 2005 21:40:15 +0100

Rob Stott wrote:

>I have a list in a text (.txt) file. I want to find out how many
>times each line occurs in the text file.
[...]
>I was wondering whether anyone had any clever ideas for doing
>this a bit more quickly.

Use a more efficient algorithm. You should only need to scan each line in the file once, whereas your current routine scans each line N times (where N is the number of lines in the file) - horribly inefficient.

The trick is to read the file one line at a time, using a dictionary object (aka 'hash' or 'associative array') to keep count of the number of times a given string is found. Example:

#!/usr/bin/python

f = file('/Users/has/test.txt') # [your path here]
d = {}
line = True
while line:
	line = f.readline()
	s = line.rstrip('\r\n')
	d[s] = d.get(s, 0) + 1
f.close()

lst = d.items()
lst.sort(lambda a, b: cmp(b[1], a[1]))
print '\n'.join(['%s\t%s' % (a, b) for a, b in lst])

HTH

has
--
http://freespace.virgin.net/hamish.sanderson/
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Working with big lists
      • From: Rob Stott <email@hidden>
  • Prev by Date: Re: Working with big lists
  • Next by Date: Font Book Examples
  • Previous by thread: Re: Getting Next and Previous Messages in Apple Mail
  • Next by thread: Re: Working with big lists
  • Index(es):
    • Date
    • Thread