Re: More fun with Unicode
Re: More fun with Unicode
- Subject: Re: More fun with Unicode
- From: John Delacour <email@hidden>
- Date: Tue, 6 May 2003 09:09:37 +0100
- Mac-eudora-version: 6.0a16
At 2:23 pm +0800 6/5/03, bill wrote:
> It demonstrates how you can write a Unicode file using hexadecimal
code. I think it's of great relevance and interest but yer joins yer
list and yer takes yer pick.
John,
Excellent demonstration :)
Is it possible to do the reverse? Read a UTF-8 file and then return the hex
code of every unicode characters?
Well, I made a few interesting discoveries yesterday and one is that
if I tell Perl to write a byte order mark followed by some UCS2
characters, it magically converts it to UTF-8, so this script
actually writes UTF8 to begin with. You then read it just as it is
(as a string of 1-byte chars), and then "as utf8" (in other words
convert it in read to Unicode characters). We then wipe the file and
write the plain UCS2 (without the bom) to it using read/write
commands, and finally read that back as data.
Step through it and see what happens. Do it in Script Editor.app and
not in the carbon one or Smile because they can't display the
characters properly. Great fun!
set fU to "/tmp/temp"
set f to fU as POSIX file
set u8 to (ASCII character 199) & "class utf8" & (ASCII character 200)
set _utf8 to run script u8
do shell script "perl -e 'open F, qq~>" & fU & "~;
print F qq~\\x{FEFF}\\x{24E6}\\x{24DE}\\x{24E6}\\x{0021}~'"
set _rawUTF8 to read f
set _class1 to class of _rawUTF8
set _UCS2 to read f as _utf8
set _class2 to class of _UCS2
open for access f with write permission
set eof f to 0
write _UCS2 to f
close access f
try
(read f as data) as string
on error e
set _dat to text 22 through -17 of e
end try
{_utf8, _rawUTF8, _class1, _UCS2, _class2, _dat, data}
-- JD
--
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.