• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode search
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode search


  • Subject: Re: Unicode search
  • From: Helmut Fuchs <email@hidden>
  • Date: Fri, 21 Mar 2003 15:38:04 +0100

At 14:09 Uhr +0000 21.03.2003, John Delacour wrote:
And this link says, that the Unicode standard allows for 21 bits to encode characters: <http://www.unicode.org/faq/utf_bom.html#9>. 21 bits clearly don't fit into two bytes.

That link says nothing of the kind. It says:

"both Unicode and ISO 10646 have policies in place that formally limit even the UTF-32 encoding form to the integer range that can be expressed with UTF-16 (or 21 significant bits)."
You are kidding, aren't you? This sentence says two things:
1. That Unicode and ISO 10646 policies are in place, that limit UTF-32 to the range expressable in UTF-16.
2. That the range of characters expressable with UTF-16 is 21 bits.

And could you please elaborate on how, for example, the "CJK Unified Ideographs Extension B" in range 20000-2A6DF from The Unicode Standard 3.1 could possibly be expressed with a single UTF-16 16 bit unit?

The entry "What is UTF-16" in the UTF and BOM FAQ <http://www.unicode.org/faq/utf_bom.html#6> explains:
UTF-16 allows access to 63K characters as single Unicode 16-bit units. It can access an additional 1M characters by a mechanism known as surrogate pairs. Two ranges of Unicode code values are reserved for the high (first) and low (second) values of these pairs. Highs are from 0xD800 to 0xDBFF, and lows from 0xDC00 to 0xDFFF. In Unicode 3.0, there are no assigned surrogate pairs. Since the most common characters have already been encoded in the first 64K values, the characters requiring surrogate pairs will be relatively rare (see below).

At 14:09 Uhr +0000 21.03.2003, John Delacour wrote:
And a pair of UTF-16 characters is two characters.
True. If it's a character it's a character. But two 16 bit units in UTF-16 CAN be ONE character. Or how would you explain this FAQ entry "Why are some people opposed to UTF-16"? <http://www.unicode.org/faq/utf_bom.html#8>

Best regards,

Helmut
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: the Holy Grail of AppleScript lists (From: Paul Berkowitz <email@hidden>)
 >Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Helmut Fuchs <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Emmanuel <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)
 >Re: Unicode search (From: Helmut Fuchs <email@hidden>)
 >Re: Unicode search (From: John Delacour <email@hidden>)

  • Prev by Date: Re: Unicode search
  • Next by Date: Re: Unicode search
  • Previous by thread: Re: Unicode search
  • Next by thread: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
  • Index(es):
    • Date
    • Thread