On May 15, 2011, at 2:22 PM, Michael D Mays wrote:
Is it possible that ASCII is just returning the low byte of a non printable multibyte character but offset is expecting all the bytes of the non printable character?
All text in AppleScript is now actually Unicode text. (That wasn't always true; string, text and Unicode text used to be distinct types.) When you extract characters from text, you're getting Unicode characters, and all offsets and lengths are in terms of Unicode characters, not bytes.
The (now deprecated) ASCII number function returns the numeric equivalent according to the MacRoman encoding of the first character of its argument text, if that first character is in the MacRoman character set. If the first character is not MacRoman, it returns instead 63, which is the MacRoman (and ASCII and Unicode) code for a question mark.
Note that ASCII, MacRoman, Latin-1, and Unicode all agree from 0 to 127. From 128 thru 255, ASCII is undefined, Latin-1 and Unicode agree with each other, but MacRoman is a whole 'nother thing. Only Unicode maps numbers 256 and up.
set apple to ASCII character 240 apple & " = " & (ASCII number of apple) & " in MacRoman, " & id of apple & " in Unicode"
returns
" = 240 in MacRoman, 63743 in Unicode"
Also note that those numbers (63 and 240 and 63743) are decimal, not hexadecimal. Decimal 63 = hexadecimal 3F = '?'; decimal 99 = hexadecimal 63 = 'c'.
|