Re: Producing Unicode-only characters
Re: Producing Unicode-only characters
- Subject: Re: Producing Unicode-only characters
- From: "Nigel Garvey" <email@hidden>
- Date: Wed, 26 Oct 2005 22:41:18 +0100
Mark J. Reed wrote on Wed, 26 Oct 2005 10:58:06 -0400:
>On 10/26/05, Nigel Garvey <email@hidden> wrote:
>>
>> Mmm. That's true. But then _all_ Unicode numbers are less than 65536 to
>> AppleScript:
>
>
>Not true. It's just that it uses UTF-16, which means that numbers higher
>than 65536 have to be encoded using surrogate pairs.
>... but in UTF-16, the character U+28CCA is encoded as the two code
>points D863 + DCCA:
>
>count «data utxtD863DCCA»
>--> 1 -- one character, UTF-16 encoded.
Aha, right. Thanks for that information! (Jaguar displays two question
marks for that Unicode character, but returns a count of one. Tiger, as
you probably know, displays just one question mark.)
Based on that, here's an update of the "temporary file" scripts (if kai
hasn't done one already). I think I've got the maths right...:
  on unicodeText(l) -- l is a list of integers
    set fref to (open for access file ((path to temporary items as
Unicode text) & "utxt scratch.txt") with write permission)
    try
      set eof fref to 0
      repeat with i from 1 to (count l)
        set n to item i of l
        if (n < 65536) then
          write n as small integer to fref
        else
          write ((n - 65536) div 1024 + 55296) as small integer to fref
          write (n mod 1024 + 56320) as small integer to fref
        end if
      end repeat
      set u to (read fref as Unicode text from 1)
    on error msg
      display dialog msg
    end try
    close access fref
    return u
  end unicodeText
  on unicodeNumbers(u) -- u is some Unicode text
    set fref to (open for access file ((path to temporary items as
Unicode text) & "utxt scratch.txt") with write permission)
    try
      set eof fref to 0
      write u to fref
      set l to (read fref as small integer from 1) as list
    end try
    close access fref
    set len to (count l)
    repeat with i from 1 to len
      set n to item i of l
      if (n is missing value) then
      else
        set n to (65536 + n) mod 65536
        if (n div 1024 is 54) and (i < len) then
          set n2 to (65536 + (item (i + 1) of l)) -- mod 65536
          if (n2 div 1024 is 55) then
            set n to n mod 1024 * 1024 + 65536 + n2 mod 1024
            set item (i + 1) of l to missing value
          end if
        end if
        set item i of l to n
      end if
    end repeat
    return l's integers
  end unicodeNumbers
  «data utxt0020D863DCCA000D0041» as Unicode text
  unicodeNumbers(result)
  --unicodeText(result)
  --unicodeNumbers(result)
NG
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden