Unicode/UTF confusion
Unicode/UTF confusion
- Subject: Unicode/UTF confusion
- From: Cameron Smith <email@hidden>
- Date: Fri, 2 May 2003 15:12:28 -0700
From ApplieScript 1.9.0 Release Notes (
http://www.apple.com/applescript/release_notes/190OSX.html): To convert a string value to the UTF-8 format, use the coercion as <<class utf8>>. So:
set s to s as <<class utf8>>
We also have:
set s to s as Unicode text
In either case, if I:
set s to "[open curly quote]test[close curly quote]"
[convert using either method above]
write s to theFile
I get, according to a BBEdit hex dump, a file that looks like:
20 1C 00 74 00 65 00 73 00 74 20 1D
which makes some sense in that 201C is the Unicode code point for [open curly quotes], 74 is a Unicode (and ASCII) "t", 65 is "e", 73 is "s", 74 is "t" again, and 201D is [close curly quotes].
But I understand it, this is not UTF-8, which is a variable length, C-compatible encoding in which a "t" should appear simply as 74, not as 00 74, and the [open curly quotes] appears as something totally different. This isn't even a proper UTF-16 encoding, is it? (Although opening it as UTF-16 is the only way that BBEdit will open it and display it properly.)
So what's going on? And can I get a proper UTF-8 encoding into a file?
--
Cameron Smith
Cutting Edge Technology Services, Inc.
Nanaimo, BC
http://www.cetsi.com/
tel: 1.250.729.9515 fax: 1.250.729.8201
------------------------------------------------
Websites ** Print Production ** Editorial Services
Publishers of the
http://SaltSpringNews.com/
------------------------------------------------
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.