• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unique Items in a text file
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unique Items in a text file


  • Subject: Re: Unique Items in a text file
  • From: has <email@hidden>
  • Date: Mon, 8 Apr 2002 23:45:08 +0100

Steve Thompson wrote:

>On a daily basis I receive a text file that contains a large number of tab
>delimited records. In each record, field 8 contains a product code.
>
>I have written a script that loops through each line of data, looks at field
>8, compares it with a list of product codes and, if the product code isn't
>in the list, it adds it.


Purely as an exercise in seeing how well vanilla code can perform, I found
the following ran through a 10,000 line, 4.4MB test string in 15 secs on my
G3/300.

======================================================================

on _extract(theString, columnIndex, resultList) --private stuff
repeat with y from 1 to count theString's paragraphs
theString's paragraph y's text item columnIndex
if result is not in resultList then
set resultList's end to result
end if
end repeat
end _extract

--

on extractFromColumn(theString, columnIndex) --public call
set resultList to {}
set oldTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to tab
set x to -99
repeat with x from 1 to ((count theString's paragraphs) div 100) *
[NO-BREAK]100 by 100
_extract(theString's text (paragraph x) thru (paragraph (x +
[NO-BREAK]99)), columnIndex, resultList)
end repeat
if (count theString's paragraphs) mod 100 is not 0 then
[NO-BREAK]_extract(theString's text (paragraph (x + 100)) thru -1,
[NO-BREAK]columnIndex, resultList)
set AppleScript's text item delimiters to oldTID
resultList
end extractFromColumn

======================================================================
[formatted using ScriptToEmail - gentle relief for mailing list pains]
[http://files.macscripter.net/ScriptBuilders/ScriptTools/ScriptToEmail.hqx]


Behaviour seems to be linear, at least as far as I tested. The trick seems
to be in working on smaller [100 line] chunks rather than the entire string
all at once; the _extract() routine bogs down badly otherwise as string
size/paragraph count (?) increases.


Be interested to hear how osaxen/application based alternatives do by
comparison.

has

--
http://www.barple.connectfree.co.uk/ -- The Little Page of Beta AppleScripts
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Unique Items in a text file
      • From: Paul Berkowitz <email@hidden>
  • Prev by Date: Activating applet or app?
  • Next by Date: Another tedious newbie question
  • Previous by thread: Re: Unique Items in a text file
  • Next by thread: Re: Unique Items in a text file
  • Index(es):
    • Date
    • Thread