Re: A faster offset...
Re: A faster offset...
- Subject: Re: A faster offset...
- From: guy parkinson <email@hidden>
- Date: Fri, 07 Sep 2001 22:51:50 -0400
This is resurrecting a rather old thread but I just got around to it.
I was very interested to see the results of your comparison of finding
offsets with OSAX and pure AS: I had naturally assumed the OSAX would be
faster and so was surprised at your result.
Before pursuing some tests of my own, I modified your searchString handler
to return a list of offsets, a value I find more interesting than the first
offset by itself, then compared it to ACME offsets and Tanaka's Search
Position.
As you can see, the pure AS version stands up very well, coming in faster
than either of the others in a test with a small string--Tanaka's routine is
so far behind in all tests I will disregard it. The value of the OSAX shows
itself in large ranges of text when searching for a token with many
instances: on a 32k string searching for a single common letter ACME offsets
far outstrips pure AS even on a single iteration. Clearly, it is the number
of offsets returned that affects speed/efficiency the most.
The following results are in "Jon's Ticks", measured over 500 iterations on
my admittedly aging machine.
offsets of "x" in "tex-mex taxes"
ACME 57
AS 48
offsets of "as" in first 5k of Boswell's Life of Johnson
Acme 141
AS 425
offsets of "those" in first 32k of Life of Johnson
Acme 254
AS 263
offsets of "c" in first 32k of Life of Johnson
Acme 3584
AS 220732
======================
//////////////////////
======================
--Modified offsets handler
-- set t to the ticks
-- repeat 500 times
-- set l to offsetInString("c", lifeOfJohnson)
-- end repeat
-- return {l, ((the ticks) - t)}
--
-- on offsetInString(theItem, theString)
-- if not (theItem = "") then
-- set AppleScript's text item delimiters to theItem
-- set theItems to (text items of theString)
-- set AppleScript's text item delimiters to ""
-- if (count of theItems) = 1 then
-- return 0
-- else
-- set theOffsets to {((length of item 1 of theItems) + 1)}
-- repeat with i from 2 to ((count of theItems) - 1)
-- set theOffsets to theOffsets & ((length of item i of
theItems) + ,
(item (i - 1) of theOffsets) + ((length of theItem) - 1) +
1)
--
-- end repeat
-- return theOffsets
-- return (length of item 1 of theItems) + 1
-- end if
-- else
-- return 0
-- end if
-- end offsetInString
delurking
guy