Re: AS and Unicode characters
Re: AS and Unicode characters
- Subject: Re: AS and Unicode characters
- From: "Mark J. Reed" <email@hidden>
- Date: Fri, 5 Jan 2007 09:27:37 -0500
On 1/5/07, KOENIG Yvan <email@hidden> wrote:
Alas, when pasting in TextEdit I discovered that the code only
grabbed the first two bytes 01D1 giving to me an infamous
(LATIN CAPITAL LETTER O WITH CARON whose code is (01D1)
when I wanted a
MUSICAL SYMBOL DOUBLE FLAT (01D12B).
I'm pretty sure the native encoding used by OS X is UTF-16, which
means that in order to deal with code points above U+FFFF, you have to
use surrogate pairs. Basically, you generate two separate code
points representing a single scalar value; each of the code points is
essentially a single "digit" in base 1024, meaning you can represent
1024x1024=1,048,576 characters that way. Add the 65,536 characters in
the basic multilingual plane and you get the full Unicode repertoire
of 1,114,412 characters.
The UTF-16 representation of U+1D12B consists of U+D834 followed by
U+DD2B. Here's how you get that from the scalar value (1D12B hex =
119,083 decimal).
1. Subtract 10000 hex = 65,536 decimal; this is the scalar value
represented by a surrogate pair number of zero. This is easy in the
hexadecimal version - just drop the leading 1. The result is D12B.
2. To convert to "base 1024", just divide and keep the quotient and
remainder. D12B hex = 53547 decimal. Divide by 1024 and you get 52
with a remainder of 299. So the two "base 1024 digits" we want to
output are 52 and 299 decimal; hex will be easier to work with, so
that's 34 and 12B
3. The first digit comes from the high surrogates area, which starts
at D800. Just add: D800 + 34 = D834. That's the first "character"
output.
4. The second digit comes from the low surrogates area at DC00. DC00
+ 12B = DD2B.
--
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users
This email sent to email@hidden