Re: How best to extract digits from a string?
Re: How best to extract digits from a string?
- Subject: Re: How best to extract digits from a string?
- From: Paul Skinner <email@hidden>
- Date: Sun, 29 Apr 2001 02:19:48 -0400
on 4/28/01 1:01 PM, Arthur J Knapp wrote:
>
> Date: Fri, 27 Apr 2001 22:35:40 -0400
>
> Subject: Re: How best to extract digits from a string?
>
> From: Paul Skinner <email@hidden>
>
>
> Faster,faster, everybody wants faster. What about linear?
>
> Here's one that's 60 times slower ( .25 seconds on my PB 400 ) but, It will
>
> never get much slower than that. : )
>
>
> Arthur J. Knapp proposed a solution that used TIDs for a possible
>
> speedup on larger strings, but it still iterated based on the length of the
>
> string.
>
>
??? Really ???
No, not really. I just said that. ; )
my bad.
>
The handler "s_contiguous" performed 10 tids-based search-and-replace
>
operations, plus 2 additional "clean-up" operations, to produce a list
>
in which all contiguous runs of number strings were together. The length
>
of the input string wasn't a factor.
>
>
Following the call to s_contiguous, I used an iteration to remove
>
the non-number runs, using the class-delete method. Rather than being
>
based on string-length, it was based on the count of "runs" of certain
>
strings.
>
which dramatically reduces the iterations performed.
>
> ... I also got stack overflow errors using his script with 20000
>
> character strings.
>
>
Right.
>
>
There is a hard-coded upper limit on AppleScript's ability to
>
return a list of more than approximately 4050 items, as the result of
>
a single coercion. All of these will cause an error:
>
>
-- BigStr is a string of 5000 characters:
>
>
every character of BigStr
>
every word of BigStr
>
>
set text item delimtiers to {""}
>
every text item of BigStr
>
yep. And I also hope that this is reported as a bug.
snip
>
> repeat with asciiCode from 0 to 255
>
> if asciiCode < 47 or asciiCode > 58 then
>
> set AppleScript's text item delimiters to ASCII character asciiCode
>
> set theText to every text item of theText
>
> set AppleScript's text item delimiters to ""
>
> set theText to theText as text
>
> end if
>
> end repeat
>
>
Good stuff. :)
>
Thanks! Means a lot coming from you.
But I wasn't satisfied. It was off the cuff and not fully thought out, as
you later point out.
>
I have a few comments, though:
>
>
Your snippet is subject to the same limit that you incurred when you ran
>
my script, ie: if any ascii character, (from 0 to 47 or from 58 to 255),
>
occurs more than around 4060 some times in theText, you will get a stack
>
overflow.
>
yep. I foolishly didn't verify that I was testing both scripts with the same
data.
>
If memory isn't a problem, you can dramatically increase the speed of
>
this script by creating a property containing all of the appropriate
>
characters. The following snippit shows how you can move all of the ascii
>
commands to execute during compile-time, thus increasing run-time speed:
Yes... but I thought of something better.
I decided to take the input and build a list of the non-numeric
characters that it contains. Then I I do a TID based removal on the input
text using only unique-non-numeric characters,.
This results in the minimum number of iterations that I could arrive at
(10+n) where n is the number of unique-non-numeric characters in the input
text.
--begin script
getNumerals("phone # 1(800)372-1476")
-->"18003721476"
on getNumerals(input)
set digits to "1234567890"
copy input to nonNumbers
repeat with thisnumber in digits
set AppleScript's text item delimiters to thisnumber
set nonNumbers to every text item of nonNumbers as text
end repeat
try
repeat with thischar in nonNumbers
set endTest to item 1 of nonNumbers
set AppleScript's text item delimiters to endTest
set input to every text item of input
set nonNumbers to every text item of nonNumbers
set AppleScript's text item delimiters to ""
set input to input as text
set nonNumbers to nonNumbers as text
end repeat
end try
return input
end getNumerals
--end script
For comparison purposes I timed three versions of this script. Bill
Cheeseman's, Arthur J. Knapp's and mine. This is averaged over 1000 runs.
PB G3 400/AS1.6. Precision timer OSAX.
I can't imagine improving on the speed of Bill's script on a short
string. But I do like the linearity I get.
I love efficiency puzzles like this, otherwise why would I have
spent this much time working on getting digits from a string? Hmmm. maybe I
should test the RegEx solution.
Nahhhh!
"Bill's 0.00304 secs. on 22 chars."
"Arthur's 0.00785 secs. on 22 chars."
"Paul's 0.01158 secs. on 22 chars."
"Bill's 0.0210 secs. on 220 chars."
"Arthur's 0.02403 secs. on 220 chars."
"Paul's 0.01564 secs. on 220 chars."
"Bill's 0.20068 secs. on 2200 chars."
"Arthur's 1.04697 secs. on 2200 chars."
"Paul's 0.04745 secs. on 2200 chars."
--
Paul Skinner