Re: writing Unicode to an existing Unicode text file on Intel Macs
Re: writing Unicode to an existing Unicode text file on Intel Macs
- Subject: Re: writing Unicode to an existing Unicode text file on Intel Macs
- From: Christopher Nebel <email@hidden>
- Date: Thu, 8 Nov 2007 13:12:40 -0800
There are a few different problems interacting here, but they all
center around the BOM (byte order mark) character.
- If there is a BOM, that determines the BE/LE setting for the entire
file.
- TextEdit will only automatically recognize UTF-16 files as UTF-16 if
there's a BOM at the beginning, or if, on Leopard, there's a
"com.apple.TextEncoding" metadata attribute. Files with an attribute
but no BOM are presumed to be native-endian.
- AppleScript's "write" command always uses UTF-16BE.
- AppleScript's "write" command won't automatically put a BOM in for
you. (There's a bug filed on that.)
Test #1 fails because you're writing UTF-16BE data into a file that
already has a UTF-16LE BOM in it.
Test #2 fails -- I'm guessing that you're running on Leopard here --
because that file isn't really empty: it has an encoding attribute on
it marking it as UTF-16, but because the contents lack a BOM, TextEdit
assumes native endian, which is wrong on Intel.
Test #3 fails because there's no BOM and no encoding attribute, so
TextEdit assumes that the file is Mac OS Roman.
You could solve #2 and #3 by writing your own BOM character, but that
won't help with the append problem in #1. My suggestion is to use
UTF-8 instead, which doesn't have endianness problems.
--Chris Nebel
AppleScript Engineering
On Nov 7, 2007, at 10:46 PM, Donald Hall wrote:
Has anyone come across the following problem before on Intel Macs,
and if so, is there a fix?
1. Open a new plain text file in TextEdit and add a few words (say,
"This is unicode text").
2. Save the file as Unicode UTF-16.
3. Create and run the following script:
-- Unicode writing test script ---------------------------------
set theFile to (path to home folder as text) & "unicode test.txt"
set theText to "added unicode text" as Unicode text
set fref to open for access file theFile with write permission
try
write theText as Unicode text to fref starting at eof
end try
close access fref
set fref2 to open for access file theFile with write permission
try
set theContents to (read fref2 as Unicode text)
end try
close access fref2
theContents
-- end script --------------------------------------------------
Test Results
Test 1: -- file initially contains the words "This is unicode text"
Script Result:
"This is unicode text <plus Japanese characters>"
Contents of file upon re-opening in TextEdit: Same as script result
Test 2: -- file starts empty (but encoding is still UTF-16)
Script Result:
"added unicode text"
Contents of file upon re-opening in TextEdit: <same Japanese
characters as above>
Test 3: -- file does not exist, created by AppleScript
Script Result:
"added unicode text"
Contents of file upon opening in TextEdit:
added unicode text *** but file appears to be Mac OS Roman ***
Note that the script works fine on a PPC Mac, giving the expected
result. On an Intel Mac it looks like an endian problem. Is there
any way around this in AppleScript? ISTR AppleScript always uses big
endian for UTF-16.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden