Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
- Subject: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
- From: has <email@hidden>
- Date: Fri, 21 Mar 2003 01:23:09 +0000
Paul Berkowitz wrote:
I recall your saying recently that the great speed advantage of searching a
long string to see if it contains a search string, over the very slow
process of searching a list of strings to see if the list contains the same
search string, is not matched by Unicode text. Does anyone know if this
slower processing of Unicode text searches using 'contains' has anything to
do with the lack of a limit on the stack, or is totally unrelated (and
therefore fixable)?
Basically what Helmut wrote. (Except the maximum's 4 bytes per
character, not 6.) Unicode's performance characteristics are a
feature of its design. Just getting a simple substring is O(n) in raw
Unicode text, compared to O(1) with an old-timey string. I hate to
think how much code it'd need slathered on top to get any better
performance. Personally, I believe it to be a cunning plot by Intel
to sell the next generation of Pentium processors with the 333THz
core and built-in thermonuclear power station to run it, but I'm just
paranoid that way.
John Delacour wrote:
Every character in Unicode proper
consists of two bytes (or 4 in the case of UTF-32) and the length is
not variable.
Not quite: in UTF-16, characters may consist of either one or two
two-byte blocks (c.f. recent discussions of separation between
characters and accents, for example), so again length is not fixed
and you've got to crawl across it every time you want to find
something. Only UTF-32 is fixed length (and I don't imagine we'll see
it in everyday use for a while yet). Out of interest, John, do Perl
regexes understand Unicode, or are they strictly old-school
one-byte-one-character? (If they do, what's their performance like?)
Here's the background spiel for them that want to knock themselves out:
http://www.unicode.org/standard/principles.html
The bit on Encoding Forms is the kicker vis-a-vis performance.
And the JoS "Shlemiel the painter's algorithm" link I like to trot
out for occasions such as these:
http://www.joelonsoftware.com/articles/fog0000000319.html
HTH
has
--
http://www.barple.pwp.blueyonder.co.uk -- The Little Page of AppleScripts
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.