Re: Unicode/UTF confusion
Re: Unicode/UTF confusion
- Subject: Re: Unicode/UTF confusion
- From: John Delacour <email@hidden>
- Date: Sat, 3 May 2003 02:09:49 +0100
- Mac-eudora-version: 6.0a16
At 3:53 pm -0700 2/5/03, Christopher Nebel wrote:
That's proper UTF-16. (It doesn't have a byte-order mark, but
that's not required.) I'm not sure who wrote that note, but the
second sentence is very misleading. There's no support for storing
data within AppleScript itself in UTF-8, so saying "s as <<class
utf8>>" has no real effect. However, you can make it write UTF-8 to
a file by saying this:
write s to theFile as <<class utf8>>
A good demonstration of what happens is below. If people experiment
with characters in the 00xx 01xx range, the visible result returned
is confusing because the first character of the UCS-2 (as I prefer to
call it) is invisible. It's even more confusing since a conversion
is made for the display.
Here you see exactly what happens to the apple character - a
character well out of the invisible range - in its various
transformations. The Mac lozenge (#215) will also serve as an
example.
set fU to "/tmp/x"
set f to POSIX file fU as file specification
set _apple to (ASCII character 240) as Unicode text
open for access f with write permission
set eof f to 0
write _apple to f
close access f
set _UCS2 to read f
set _UCS2bytes to {}
repeat with c in characters of _UCS2
set end of _UCS2bytes to ASCII number c
end repeat
open for access f with write permission
set eof f to 0
write _apple to f as <<class utf8>> -- TYPE THE PROPER THINGS <SIGH>
close access f
set _UTF8 to read f
set _UTF8_bytes to {}
repeat with c in characters of _UTF8
set end of _UTF8_bytes to ASCII number c
end repeat
{_apple, "UCS-2=>", _UCS2, _UCS2bytes, "UTF-8=>", _UTF8, _UTF8_bytes}
JD
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.