• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode search
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode search


  • Subject: Re: Unicode search
  • From: Helmut Fuchs <email@hidden>
  • Date: Fri, 21 Mar 2003 14:15:40 +0100

At 12:16 Uhr +0000 21.03.2003, John Delacour wrote:
By definition UTF-16 is two bytes. 256 * 256 = 65536, so that's the limit. In practice there are fewer code points assigned than that.
John, you really disappoint me. Before pointing a "by definition" at someone, you should read the definition, I guess.

The numbers in the UTF only tell how big the chunks are, that the Unicode data is stored in. UTF-8 is 8 bit units, UTF-16 is 16 bit units and UTF-32 stands for 32 bit units.

In RFC2279 you can find, that a UTF-8 character can be made up of up to 6 units of 8 bits. And this link says, that the Unicode standard allows for 21 bits to encode characters: <http://www.unicode.org/faq/utf_bom.html#9>. 21 bits clearly don't fit into two bytes.

Of course, the most common characters are coded into 2 bytes under UTF-16.

All of them. Give me an example of a character in UTF-16 that is not two bytes.
Please read more about Unicode before making such claims. For example an accented character in decomposed form takes up two UTF-16 units, but AFAIK it should be treated as a _single_ character. And as said before: the current Unicode standard allows for 21 bits of character encoding - to allow this, UTF-16 implements a mechanism called "surrogate pairs":
<http://www.unicode.org/faq/utf_bom.html#6>

And this says something about ignoring surrogates altogether (bad idea, but I know you were thinking of it already ;-):
<http://www.unicode.org/faq/utf_bom.html#17>

Best regards,

Helmut
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Unicode search
      • From: John Delacour <email@hidden>
References: 
 >Re: the Holy Grail of AppleScript lists (From: Paul Berkowitz <email@hidden>)
 >Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Helmut Fuchs <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Emmanuel <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)

  • Prev by Date: Re: applescript-users digest, Vol 3 #1485 - 10 msgs
  • Next by Date: Re: Unicode search
  • Previous by thread: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
  • Next by thread: Re: Unicode search
  • Index(es):
    • Date
    • Thread