Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Unicode search

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode search

Subject: Re: Unicode search
From: Helmut Fuchs <email@hidden>
Date: Fri, 21 Mar 2003 14:15:40 +0100

At 12:16 Uhr +0000 21.03.2003, John Delacour wrote:

By definition UTF-16 is two bytes. 256 * 256 = 65536, so that's the limit. In practice there are fewer code points assigned than that.

John, you really disappoint me. Before pointing a "by definition" at someone, you should read the definition, I guess.

The numbers in the UTF only tell how big the chunks are, that the Unicode data is stored in. UTF-8 is 8 bit units, UTF-16 is 16 bit units and UTF-32 stands for 32 bit units.

In RFC2279 you can find, that a UTF-8 character can be made up of up to 6 units of 8 bits. And this link says, that the Unicode standard allows for 21 bits to encode characters: <http://www.unicode.org/faq/utf_bom.html#9>. 21 bits clearly don't fit into two bytes.

Of course, the most common characters are coded into 2 bytes under UTF-16.

All of them. Give me an example of a character in UTF-16 that is not two bytes.

Please read more about Unicode before making such claims. For example an accented character in decomposed form takes up two UTF-16 units, but AFAIK it should be treated as a _single_ character. And as said before: the current Unicode standard allows for 21 bits of character encoding - to allow this, UTF-16 implements a mechanism called "surrogate pairs":
<http://www.unicode.org/faq/utf_bom.html#6>

And this says something about ignoring surrogates altogether (bad idea, but I know you were thinking of it already ;-):
<http://www.unicode.org/faq/utf_bom.html#17>

Best regards,

Helmut
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: Unicode search
  - From: John Delacour <email@hidden>

References:
	>Re: the Holy Grail of AppleScript lists (From: Paul Berkowitz <email@hidden>)
	>Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Helmut Fuchs <email@hidden>)
	>Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)
	>Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Emmanuel <email@hidden>)
	>Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)

Prev by Date: Re: applescript-users digest, Vol 3 #1485 - 10 msgs
Next by Date: Re: Unicode search
Previous by thread: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
Next by thread: Re: Unicode search
Index(es):
- Date
- Thread