Re: Convert a string into a list without Delimiters
Re: Convert a string into a list without Delimiters
- Subject: Re: Convert a string into a list without Delimiters
- From: email@hidden (Michael Sullivan)
- Date: Fri, 9 Nov 2001 13:18:58 -0500
- Organization: Business Card Express of Connecticut
>
Does somebody have an idea how to convert a string in
>
to a list without having delimiters in the string.
>
I would like to put a block of 4 characters to a list
>
item, so
>
item 1 = characters 1-4
>
item 2 = characters 5-8
>
item 3 = characters 9-13
>
etc.
>
Does anybody knows an OSAX for this.
>
I denitely do not want to use a repeat function,
>
because I am dealing with (very) huge strings.
Okay, I know you don't want to hear this but how huge is very huge?
I'm a determined cuss about speed gains, and I'm kind of proud of myself
for figuring out a pretty nice one for this algorithm on long strings.
I realized as I did the analysis of the basic algorithm, that it's N^2
behavior, was really M*N, where N is the length of the string, and M is
the number of groups you're breaking it into. When the string is long
and the tokens are small, this approximates O(N^2)
So, I tried a recursive algorithm that looks at the number of groups
that would be coming out, and if it's too many, it breaks the long
string into a small number of groups then calls itself for each group
and concatenates everything.
Without knowing the implementation more precisely and doing analysis of
a scope I haven't done since taking algorithms, I can't be completely
sure that this is n*log(n), but it sure is lots faster than the standard
repeat loop, and in timing tests it appears to behave as O(n*log(n)).
This breaks a 30K character string (close to the largest AS can handle)
into a list of 4 character strings with one possibly smaller last string
in less than 10 seconds on my beige G3/233 desktop.
It does a 6K character string in just over 1 second.
Strings under 1000 characters are very fast (12 ticks or less).
Take that you linked list string implementors!
Anyway, I hope that's fast enough to be worth using.
The only way I can see to get much faster would be an osax.
Oh yeah -- here's the script:
What the heck, should I submit this to Applemods?
-- begin script ----------------------------------------------------
--
-- GroupTheString by Michael E. Sullivan ) 2001
-- Free to use, modify or distribute with full credit.
--
-- This will break a string into a list of strings that are numChars
-- long.
--
-- usage:
-- set aString to "0123456789"
-- set numChars to 4
-- set theList to GroupTheString(aString, numChars)
-- <<result: {"0123","4567","89"}>>
--
-- All lines begin with 2-space indent. Anyline beginning in the
-- first two columns of text is a wrap from the previous line.
--
--------------------------------------------------------------------
on GroupTheString(theString, numChars)
set newList to {}
set oldTIDs to text item delimiters
set text item delimiters to ""
set totalLen to count of characters of theString
set numGroups to totalLen div numChars
if numGroups > 12 then -- 12 is arbitrary seemed to time best
set midChars to numChars * (numGroups div 12 + 1)
set tempList to GroupTheString(theString, midChars)
repeat with j from 1 to count of tempList
set partialGroup to item j of tempList
set newList to newList & GroupTheString(partialGroup,
numChars)
end repeat
else -- End recursion if there will be less than 12 groups
repeat with i from 1 to totalLen by numChars
if i + numChars - 1 > totalLen then
set tempString to (characters i thru (i + numChars - 1)
of theString) as string
else
set tempString to (characters i thru (totalLen) of
theString) as string
end if
set newList to newList & tempString
end repeat
end if
set text item delimiters to oldTIDs
return newList
end GroupTheString
-- test client implementation --
set aString to ""
set t1 to the ticks
repeat 1000 times
set aString to aString & "123456789012345678901234567890"
end repeat
ListTheString(aString, 4)
set t2 to the ticks
set theTicks to t2 - t1
--> theTicks = 489
-- end Script ------------------------------------------------------
I hope this'll work for you, Markus.
Michael
--
Michael Sullivan email@hidden
Business Card Express of Connecticut Thermographers to the Trade
"You hate your job -- why didn't you say so? There's a support group
for that. It's called everybody; they meet at the bar." -Drew Carey