Unicode search [was Re: the Holy Grail of AppleScript lists]
Unicode search [was Re: the Holy Grail of AppleScript lists]
- Subject: Unicode search [was Re: the Holy Grail of AppleScript lists]
- From: Helmut Fuchs <email@hidden>
- Date: Thu, 20 Mar 2003 23:55:08 +0100
At 14:10 Uhr -0800 20.03.2003, Paul Berkowitz wrote:
Does anyone know if this
slower processing of Unicode text searches using 'contains' has anything to
do with the lack of a limit on the stack, or is totally unrelated (and
therefore fixable)?
Unicode uses a variable length encoding scheme. So you can't know
beforehand into how many bytes a certain amount of characters are
encoded (between 1 to 6 bytes per character in UTF-8, if I remember
correctly).
The quickest string search algorithms compute skipping tables
beforehand, that tell them how many characters can safely be skipped,
if a comparison on a single character fails. This advantage is lost
in Unicode, as you still have to look at every character in order to
merely skip it.
If you'd store every string in UTF-32 (the four byte version) this
speed penalty would not have to be paid, as one character equals four
bytes - but this would be memory intensive and you'd have to convert
between representations too often.
So I guess there's not very much that can be done to speed up Unicode
searching.
-Helmut
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.