Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Unicode search [was Re: the Holy Grail of AppleScript lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode search [was Re: the Holy Grail of AppleScript lists]

Subject: Unicode search [was Re: the Holy Grail of AppleScript lists]
From: Helmut Fuchs <email@hidden>
Date: Thu, 20 Mar 2003 23:55:08 +0100

At 14:10 Uhr -0800 20.03.2003, Paul Berkowitz wrote:

Does anyone know if this
slower processing of Unicode text searches using 'contains' has anything to
do with the lack of a limit on the stack, or is totally unrelated (and
therefore fixable)?

Unicode uses a variable length encoding scheme. So you can't know beforehand into how many bytes a certain amount of characters are encoded (between 1 to 6 bytes per character in UTF-8, if I remember correctly).

The quickest string search algorithms compute skipping tables beforehand, that tell them how many characters can safely be skipped, if a comparison on a single character fails. This advantage is lost in Unicode, as you still have to look at every character in order to merely skip it.

If you'd store every string in UTF-32 (the four byte version) this speed penalty would not have to be paid, as one character equals four bytes - but this would be memory intensive and you'd have to convert between representations too often.

So I guess there's not very much that can be done to speed up Unicode searching.

-Helmut
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
  - From: John Delacour <email@hidden>

References:
	>Re: the Holy Grail of AppleScript lists (From: Paul Berkowitz <email@hidden>)

Prev by Date: Re: max script size
Next by Date: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
Previous by thread: Re: the Holy Grail of AppleScript lists
Next by thread: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
Index(es):
- Date
- Thread