Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
- Subject: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
- From: Emmanuel <email@hidden>
- Date: Fri, 21 Mar 2003 11:49:01 +0100
At 11:35 PM +0000 20/03/03, John Delacour wrote:
At 11:55 pm +0100 20/3/03, Helmut Fuchs wrote:
Unicode uses a variable length encoding scheme. So you can't know
beforehand into how many bytes a certain amount of characters are
encoded (between 1 to 6 bytes per character in UTF-8, if I remember
correctly).
But UTF-8 is a transformation of Unicode using a speial algorithm,
just as binhex or uuencode. Every character in Unicode proper
consists of two bytes (or 4 in the case of UTF-32) and the length is
not variable.
That's often true, but it is not a rule and it is sometimes false.
Not all glyphs are coded into 2 bytes under UTF-16. As you certainly
noticed UTF-16 allows for displaying much more than 32,767 characters.
Of course, the most common characters are coded into 2 bytes under UTF-16.
Emmanuel
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.