Re: How best to extract digits from a string?
Re: How best to extract digits from a string?
- Subject: Re: How best to extract digits from a string?
- From: Bill Cheeseman <email@hidden>
- Date: Sat, 28 Apr 2001 10:21:42 -0400
I have devised an even faster algorithm, thanks to important insights from
Paul Skinner and Arthur Knapp.
Paul recognized that the typical telephone number input string contains
fewer non-digits than digits, so in theory it should typically be faster to
process the non-digits than to process the digits. (However, this is true
only if you can handle error conditions and the need for more elaborate
processing fast enough to avoid squandering the savings.)
Arthur recognized that sentinel characters can sometimes be useful.
(However, this is only true if the overhead from handling the sentinel
characters does not eat up the benefits.)
My new script is in the GetDigits1 handler in the timing script shown below.
It processes the input string in place, so to speak. When it encounters a
digit in the repeat loop, it does nothing, which is pretty fast. When it
encounters a non-digit in the repeat loop, it modifies the input string by,
in effect, deleting the non-digit on the fly (actually, it uses a temporary
variable to concatenate the front end and the back end of the input string).
Because the input string gets shorter, the script has to traverse the string
backwards to avoid index errors. Placing a sentinel character at the
beginning and the end of the input string before entering the repeat loop
takes little time and avoids the need to trap for out-of-range errors in the
repeat loop if the input string contains a non-digit in the first and/or
last place.
Because it processes non-digits, its speed advantage increases if the user
inputs a phone number with fewer non-digits, even beyond the speedup to be
expected from entering a shorter string.
Case 1: Using the very common input form "(800) 555-1212", this script is
approximately 16% faster than my previous best (the GetDigits2 handler in
the timing script shown below) on the same input string.
Case 2: Using the less common but more efficient input form "800 555-1212"
(or "800-555-1212" or "800 555 1212"), this script is approximately 18%
faster than my previous best on the same input string. This is 19% faster
than the same script applied to Case 1, although the string is only 14%
shorter than case 1.
Case 3: Using the "perfect" input form "8005551212", this script is
approximately 40% faster than my previous best on the same input string.
This is 44% faster than the same script applied to case 1, although the
string is only 29% shorter than case 1.
Here follows the timing script. The best algorithm to date is implemented in
the GetDigits1 handler. Thanks to my anonymous contributor for the timing
method. Watch for wordwrap.
to GetDigits1 from input
set digits to "1234567890"
set input to "^" & input & "^"
repeat with i from (length of input) - 1 to 2 by -1
if character i of input is not in digits then
set input to text 1 thru (i - 1) of input & text (i + 1) thru -1
of input
end if
end repeat
return text 2 thru -2 of input
end GetDigits1
to Getdigits2 from input
set digits to "1234567890"
set output to ""
repeat with char in input
if char is in digits then set output to output & char
end repeat
return output
end Getdigits2
set input to "8005551212"
set time1 to (current date)
repeat 1000 times
GetDigits1 from input
end repeat
set time2 to (current date)
repeat 1000 times
Getdigits2 from input
end repeat
set time3 to (current date)
set score to ("GetDigits1: " & time2 - time1 & return & "GetDigits2: " &
time3 - time2)
beep
display dialog score buttons {"OK"} default button "OK"
--
Bill Cheeseman - email@hidden
Quechee Software, Quechee, Vermont, USA
The AppleScript Sourcebook - www.AppleScriptSourcebook.com
Vermont Recipes - www.stepwise.com/Articles/VermontRecipes