Re: Unicode search
Re: Unicode search
- Subject: Re: Unicode search
- From: John Delacour <email@hidden>
- Date: Fri, 21 Mar 2003 17:57:19 +0000
- Mac-eudora-version: 6.0a11
At 3:30 pm +0100 21/3/03, Emmanuel wrote:
Open the in-line input Character palette, choose "All" in "View",
click "Unicode Table", scroll down to FFFF (visit F8FF, an apple).
On my machine the last 2-bytes non-empty character is FFFD, a white
question mark over a black losange. But then - ta-da! - you've got
10000, etc.
Yes. And that's the end of the basic multilingual plane, Up to that
point the significant bytes used to represent the character are
identical in UTF-16 and UTF-32. so the character displayed as an
apple -- which is actually not designated as an apple but as a
<private use> character -- is
U+F8FF
0xF8FF in UTF-16
0x0000F8FF in UTF-32
After that everything changes and four bytes of UTF-16 are required
to encode three significant bytes of UTF-32, so for example the Han
character #142008 is
U+22AB8
0x00022AB8 in UTF-32
but to encode it in UTF-16 you need four completely different bytes.
Safari will display this character if you save this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"
http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-16">
</head>
<body>
��
</body>
I think we are all arguing at cross purposes and probably all three
of us have been arguing rather loosely. For the purposes of the
Basic Multilingual Plane a UTF-16 encoding of a character consists of
two bytes. After that the UTF-16 representation becomes a
transformation of UTF-32 just as UTF-8 is a transformation of UTF-16,
though the algorithm is different and characters from #10000 up to
#111411 can be represented, and always are, with four bytes un UTF-16.
JD
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.