• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Producing Unicode-only characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Producing Unicode-only characters


  • Subject: Re: Producing Unicode-only characters
  • From: "Mark J. Reed" <email@hidden>
  • Date: Wed, 26 Oct 2005 14:27:07 -0400

On 10/26/05, bill <email@hidden> wrote:
I'm surprised to see the glyph of U+28CCA, the meaning of this
character is not, ... well, suitable for public discussion.

Well, now you've gone and piqued our curiosity.  :) What does it mean?
 

BTW, «data utxt00028CCA» does not produce code point U+28CCA, you may
compare this one:

Right.  That's just two characters, U+0002 and U+8CCA.  As I said in my earlier message, "unicode text" is stored using UTF-16.  Which means ou have to use the surrogates to get characters above U+FFFF.

Anyone know the mechanism why & now code point beyond U+FFFF is
composed by hex values?

Yes: surrogate pairs.  The code points in the range U+D800 through U+DFFF are reserved for this purpose.  Essentially, characters above U+FFFF are represented as a two-digit base-1024 number whose value is the difference between the desired character and the first code point that doesn't fit in 16 bits.  In other words,  U+10000 = 65536 decimal is stored as 0, U+10001 is stored as 1, etc.  The highest representable value is therefore the sum of hex 10000 + FFFFF = U+10FFFF, which in decimal is the nicely palindromic number 1114111.

The first (high) digit is chosen from U+D800 through U+DBFF (D800 = 0, D801 = 1, ..., DBFE = 1022, DBFF = 1023) and the second (low) digit is chosen from U+DC00 through U+DFFFF the same way. 

So, for our unspeakable character U+28CCA:

1. Get the Unicode scalar value, which you do by just converting from hexadecimal into a number we can do math on.  "28CCA" is hexadecimal for 167114.

2. Subtract 65536: 167114 - 65536 = 101578

3. Divide by 1024, yielding an integer quotient and a remainder:     101578 / 1024 = 24 with remainder 412

4. The quotient is the high surrogate value.  Add it to U+D800 and store that character.  24 decimal = 18 hex, so the first character is U+D818.

5. The remainder is the low surrogate value.  Add it to U+DC00 and store that character.  412 decimal = 19C hex, so the second character is U+DD9C.

So («data utxtDC00DD9C» as Unicode text) should yield U+28CCA.

--
Mark J. Reed <email@hidden>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References: 
 >Re: Producing Unicode-only characters (From: bill <email@hidden>)

  • Prev by Date: Re: Producing Unicode-only characters
  • Next by Date: Re: curl equivalence
  • Previous by thread: Re: Producing Unicode-only characters
  • Next by thread: Get Current cursor position in InDesign CS
  • Index(es):
    • Date
    • Thread