• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: writing Unicode to an existing Unicode text file on Intel Macs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: writing Unicode to an existing Unicode text file on Intel Macs


  • Subject: Re: writing Unicode to an existing Unicode text file on Intel Macs
  • From: Christopher Nebel <email@hidden>
  • Date: Thu, 8 Nov 2007 13:12:40 -0800

There are a few different problems interacting here, but they all center around the BOM (byte order mark) character.

- If there is a BOM, that determines the BE/LE setting for the entire file.
- TextEdit will only automatically recognize UTF-16 files as UTF-16 if there's a BOM at the beginning, or if, on Leopard, there's a "com.apple.TextEncoding" metadata attribute. Files with an attribute but no BOM are presumed to be native-endian.
- AppleScript's "write" command always uses UTF-16BE.
- AppleScript's "write" command won't automatically put a BOM in for you. (There's a bug filed on that.)


Test #1 fails because you're writing UTF-16BE data into a file that already has a UTF-16LE BOM in it.

Test #2 fails -- I'm guessing that you're running on Leopard here -- because that file isn't really empty: it has an encoding attribute on it marking it as UTF-16, but because the contents lack a BOM, TextEdit assumes native endian, which is wrong on Intel.

Test #3 fails because there's no BOM and no encoding attribute, so TextEdit assumes that the file is Mac OS Roman.

You could solve #2 and #3 by writing your own BOM character, but that won't help with the append problem in #1. My suggestion is to use UTF-8 instead, which doesn't have endianness problems.


--Chris Nebel AppleScript Engineering

On Nov 7, 2007, at 10:46 PM, Donald Hall wrote:

Has anyone come across the following problem before on Intel Macs, and if so, is there a fix?

1. Open a new plain text file in TextEdit and add a few words (say, "This is unicode text").
2. Save the file as Unicode UTF-16.
3. Create and run the following script:


-- Unicode writing test script ---------------------------------
set theFile to (path to home folder as text) & "unicode test.txt"

set theText to "added unicode text" as Unicode text

set fref to open for access file theFile with write permission

try
	write theText as Unicode text to fref starting at eof
end try
close access fref

set fref2 to open for access file theFile with write permission
try
	set theContents to (read fref2 as Unicode text)
end try
close access fref2

theContents

-- end script --------------------------------------------------

Test Results

Test 1: -- file initially contains the words "This is unicode text"

Script Result:

"This is unicode text <plus Japanese characters>"

Contents of file upon re-opening in TextEdit: Same as script result


Test 2: -- file starts empty (but encoding is still UTF-16)

Script Result:

"added unicode text"

Contents of file upon re-opening in TextEdit: <same Japanese characters as above>


Test 3: -- file does not exist, created by AppleScript

Script Result:

"added unicode text"

Contents of file upon opening in TextEdit:

added unicode text *** but file appears to be Mac OS Roman ***

Note that the script works fine on a PPC Mac, giving the expected result. On an Intel Mac it looks like an endian problem. Is there any way around this in AppleScript? ISTR AppleScript always uses big endian for UTF-16.

_______________________________________________ Do not post admin requests to the list. They will be ignored. AppleScript-Users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden
References: 
 >writing Unicode to an existing Unicode text file on Intel Macs (From: Donald Hall <email@hidden>)

  • Prev by Date: Update Fonts available to system.
  • Next by Date: Metadata.osax: odd behavior trying to install
  • Previous by thread: writing Unicode to an existing Unicode text file on Intel Macs
  • Next by thread: How to convert a POSIX path to normal path?
  • Index(es):
    • Date
    • Thread