• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode search
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode search


  • Subject: Re: Unicode search
  • From: John Delacour <email@hidden>
  • Date: Fri, 21 Mar 2003 17:57:19 +0000
  • Mac-eudora-version: 6.0a11

At 3:30 pm +0100 21/3/03, Emmanuel wrote:

Open the in-line input Character palette, choose "All" in "View", click "Unicode Table", scroll down to FFFF (visit F8FF, an apple). On my machine the last 2-bytes non-empty character is FFFD, a white question mark over a black losange. But then - ta-da! - you've got 10000, etc.

Yes. And that's the end of the basic multilingual plane, Up to that point the significant bytes used to represent the character are identical in UTF-16 and UTF-32. so the character displayed as an apple -- which is actually not designated as an apple but as a <private use> character -- is

U+F8FF
0xF8FF in UTF-16
0x0000F8FF in UTF-32

After that everything changes and four bytes of UTF-16 are required to encode three significant bytes of UTF-32, so for example the Han character #142008 is

U+22AB8
0x00022AB8 in UTF-32

but to encode it in UTF-16 you need four completely different bytes. Safari will display this character if you save this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd";>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-16">
</head>
<body>
&#xD84A;&#xDEB8;
</body>

I think we are all arguing at cross purposes and probably all three of us have been arguing rather loosely. For the purposes of the Basic Multilingual Plane a UTF-16 encoding of a character consists of two bytes. After that the UTF-16 representation becomes a transformation of UTF-32 just as UTF-8 is a transformation of UTF-16, though the algorithm is different and characters from #10000 up to #111411 can be represented, and always are, with four bytes un UTF-16.

JD
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: the Holy Grail of AppleScript lists (From: Paul Berkowitz <email@hidden>)
 >Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Helmut Fuchs <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Emmanuel <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: John Delacour <email@hidden>)
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: Emmanuel <email@hidden>)

  • Prev by Date: Re: Unicode search [oops]
  • Next by Date: Re: Unicode search
  • Previous by thread: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
  • Next by thread: Re: the Holy Grail of AppleScript lists
  • Index(es):
    • Date
    • Thread