Re: How best to extract digits from a string?
Re: How best to extract digits from a string?
- Subject: Re: How best to extract digits from a string?
- From: "Arthur J Knapp" <email@hidden>
- Date: Sat, 28 Apr 2001 13:01:58 -0400
Date: Fri, 27 Apr 2001 22:35:40 -0400
Subject: Re: How best to extract digits from a string?
From: Paul Skinner <email@hidden>
Faster,faster, everybody wants faster. What about linear?
Here's one that's 60 times slower ( .25 seconds on my PB 400 ) but, It will
never get much slower than that. : )
Arthur J. Knapp proposed a solution that used TIDs for a possible
speedup on larger strings, but it still iterated based on the length of the
??? Really ???
The handler "s_contiguous" performed 10 tids-based search-and-replace
operations, plus 2 additional "clean-up" operations, to produce a list
in which all contiguous runs of number strings were together. The length
of the input string wasn't a factor.
Following the call to s_contiguous, I used an iteration to remove
the non-number runs, using the class-delete method. Rather than being
based on string-length, it was based on the count of "runs" of certain
... I also got stack overflow errors using his script with 20000
character strings.
There is a hard-coded upper limit on AppleScript's ability to
return a list of more than approximately 4050 items, as the result of
a single coercion. All of these will cause an error:
-- BigStr is a string of 5000 characters:
every character of BigStr
every word of BigStr
set text item delimtiers to {""}
every text item of BigStr
Instead of tweeking the 'add it if it's a number' route I remove
everything thats not a number. 256 iterations. pass it 20000 characters and
it still only has to do 255 iterations. At 20000 characters it takes .5
seconds vs 157 seconds or so for Paul Berkowitz's original method or the
similar variants (these scripts times are very non-linear when fed more than
several hundred characters ).
repeat with asciiCode from 0 to 255
if asciiCode < 47 or asciiCode > 58 then
set AppleScript's text item delimiters to ASCII character asciiCode
set theText to every text item of theText
set AppleScript's text item delimiters to ""
set theText to theText as text
end if
end repeat
Good stuff. :)
I have a few comments, though:
Your snippet is subject to the same limit that you incurred when you ran
my script, ie: if any ascii character, (from 0 to 47 or from 58 to 255),
occurs more than around 4060 some times in theText, you will get a stack
If memory isn't a problem, you can dramatically increase the speed of
this script by creating a property containing all of the appropriate
characters. The following snippit shows how you can move all of the ascii
commands to execute during compile-time, thus increasing run-time speed:
property kNotDigits : run script "set charList to {}
repeat with x from 0 to 47
set end of charList to ASCII character x
repeat with x from 58 to 255
set end of charList to ASCII character x
return \"\" & charList"
set theText to "(800) 555-1212"
repeat with char in kNotDigits
if theText contains char then
set AppleScript's text item delimiters to {"" & char}
set theText to every text item of theText
set AppleScript's text item delimiters to {""}
set theText to "" & theText
end if
end repeat
theText -- > "8005551212"
Arthur J. Knapp
Hey, check out: