Re: Unicode/UTF confusion
Re: Unicode/UTF confusion
- Subject: Re: Unicode/UTF confusion
- From: John Delacour <email@hidden>
- Date: Sat, 3 May 2003 18:42:07 +0100
- Mac-eudora-version: 6.0a16
At 3:53 pm -0700 2/5/03, Christopher Nebel wrote:
That's proper UTF-16. (It doesn't have a byte-order mark, but
that's not required.) I'm not sure who wrote that note, but the
second sentence is very misleading. There's no support for storing
data within AppleScript itself in UTF-8, so saying "s as <<class
utf8>>" has no real effect. However, you can make it write UTF-8 to
a file by saying this:
write s to theFile as <<class utf8>>
That's all very well but nothing is joined up. If I write a file as
UTF-8 then what do I do with it? Since practically nothing is
scriptable in TextEdit, I can't open that file in TextEdit unless I
_manually_ set TextEdit's open preferences to UTF-8. I can't tell
TextEdit to open f as <<class utf8>> and I can't tell it so save a
document as UTF-8, and TextEdit appears not to make any guess at all
as to the content of the files it opens.
If I run script 1 below with different TE prefs I get
1. UTF8 -- OK (Apple and Lozenge displayed)
2. Automatic -- Raw bytes of UTF-8
3. UTF-16 -- 1 missing glyph + 2 CJK characters
If I run script 2, then TextEdit is intelligent enough to decide that
the byte order mark might just signify UTF-16 content and the
behaviour is correct with all three prefs settings, but in order to
get the content as UTF-8 I need to run something like Script 3.
This script (3) shows what an absurd situation we have. In order to
get the result I need to get the path from TextEdit AS A UNIX PATH on
a Macintosh, convert that to a file reference in order to use read
write commands, and then convert that to an alias in order to tell
TextEdit to open it. TextEdit gives me the document's path as a
slashy string but when asked to open a slashy string or a file
reference throws an error and insists on an alias.
Someone at Apple needs to get people working together on a unified
approach to things. TextEdit is neither fish nor fowl. And now we
have this brand new finderspec thing the 'document file', which won't
travel anywhere without being coerced to something else that it ought
to have been in the first place. What's it all about? Why can't we
just have slashy strings for everything and consign all this old
rubbish to history?
-- script 1
set fU to "/tmp/u.txt"
set f to fU as POSIX file
set applelozenge to (ASCII character 240) & (ASCII character 215) as
Unicode text
open for access f with write permission
set eof f to 0
write applelozenge to f as <<class utf8>>
close access f
set s to read f
do shell script "open -e " & fU
return s
--script 2
set fU to "/tmp/u.txt"
set f to fU as POSIX file
set zws to (ASCII character 254) & (ASCII character 255) as string
set applelozenge to(ASCII character 240)&(ASCII character 215)as Unicode text
open for access f with write permission
set eof f to 0
write zws to f
write applelozenge to f
close access f
set s to read f
do shell script "open -e " & fU
-- [ and why can't I concatenate zws & applelozenge ?? ]
--script 3
set fU to "/tmp/u.txt"
set f to fU as POSIX file
set zws to (ASCII character 254) & (ASCII character 255) as string
set applelozenge to (ASCII character 240) & (ASCII character 215) as
Unicode text
open for access f with write permission
set eof f to 0
write zws to f
write applelozenge to f
close access f
set s to read f
do shell script "open -e " & fU
tell application "TextEdit"
tell document 1
set utext to get its text
set fU to path
set f to fU as POSIX file
open for access f with write permission
set eof f to 0
write utext to f as <<class utf8>>
close access f
close
end tell
open {f as alias}
end tell
{zws & applelozenge, utext, read f}
JD
--
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.