Re: How best to extract digits from a string?
Re: How best to extract digits from a string?
- Subject: Re: How best to extract digits from a string?
- From: "Arthur J Knapp" <email@hidden>
- Date: Sat, 28 Apr 2001 13:01:58 -0400
>
Date: Fri, 27 Apr 2001 22:35:40 -0400
>
Subject: Re: How best to extract digits from a string?
>
From: Paul Skinner <email@hidden>
>
Faster,faster, everybody wants faster. What about linear?
>
Here's one that's 60 times slower ( .25 seconds on my PB 400 ) but, It will
>
never get much slower than that. : )
>
Arthur J. Knapp proposed a solution that used TIDs for a possible
>
speedup on larger strings, but it still iterated based on the length of the
>
string.
??? Really ???
The handler "s_contiguous" performed 10 tids-based search-and-replace
operations, plus 2 additional "clean-up" operations, to produce a list
in which all contiguous runs of number strings were together. The length
of the input string wasn't a factor.
Following the call to s_contiguous, I used an iteration to remove
the non-number runs, using the class-delete method. Rather than being
based on string-length, it was based on the count of "runs" of certain
strings.
>
... I also got stack overflow errors using his script with 20000
>
character strings.
Right.
There is a hard-coded upper limit on AppleScript's ability to
return a list of more than approximately 4050 items, as the result of
a single coercion. All of these will cause an error:
-- BigStr is a string of 5000 characters:
every character of BigStr
every word of BigStr
set text item delimtiers to {""}
every text item of BigStr
>
Instead of tweeking the 'add it if it's a number' route I remove
>
everything thats not a number. 256 iterations. pass it 20000 characters and
>
it still only has to do 255 iterations. At 20000 characters it takes .5
>
seconds vs 157 seconds or so for Paul Berkowitz's original method or the
>
similar variants (these scripts times are very non-linear when fed more than
>
several hundred characters ).
>
repeat with asciiCode from 0 to 255
>
if asciiCode < 47 or asciiCode > 58 then
>
set AppleScript's text item delimiters to ASCII character asciiCode
>
set theText to every text item of theText
>
set AppleScript's text item delimiters to ""
>
set theText to theText as text
>
end if
>
end repeat
Good stuff. :)
I have a few comments, though:
Your snippet is subject to the same limit that you incurred when you ran
my script, ie: if any ascii character, (from 0 to 47 or from 58 to 255),
occurs more than around 4060 some times in theText, you will get a stack
overflow.
If memory isn't a problem, you can dramatically increase the speed of
this script by creating a property containing all of the appropriate
characters. The following snippit shows how you can move all of the ascii
commands to execute during compile-time, thus increasing run-time speed:
property kNotDigits : run script "set charList to {}
repeat with x from 0 to 47
set end of charList to ASCII character x
end
repeat with x from 58 to 255
set end of charList to ASCII character x
end
return \"\" & charList"
set theText to "(800) 555-1212"
repeat with char in kNotDigits
if theText contains char then
set AppleScript's text item delimiters to {"" & char}
set theText to every text item of theText
set AppleScript's text item delimiters to {""}
set theText to "" & theText
end if
end repeat
theText -- > "8005551212"
Arthur J. Knapp
http://www.stellarvisions.com
mailto:email@hidden
Hey, check out:
http://home.earthlink.net/~eagrant/