Re: Reading a File (Was: Re: TextEdit and Text Item Delimiters)
Re: Reading a File (Was: Re: TextEdit and Text Item Delimiters)
- Subject: Re: Reading a File (Was: Re: TextEdit and Text Item Delimiters)
- From: Christopher Nebel <email@hidden>
- Date: Wed, 31 Jul 2002 19:08:39 -0700
On Wednesday, July 31, 2002, at 05:17 PM, email@hidden wrote:
I get ALMOST what I expect, but it has two garbage characters at the
very
beginning (what looks like a backwards comma, and a flipped-upside-down
carrot). I could do:
Sounds like you saved the file as Unicode. If you say "characters of
item 1 of pairs", you should see apparently empty strings between every
real character. (They actually contain a null character which doesn't
display as anything.)
The "garbage" at the front is a magic Unicode character which both
identifies the contents as Unicode and tells you whether the data is
big- or little-endian. (If you were moving files between, say, Mac and
Windows systems it would make a difference.) It's not technically
required (I think), but TextEdit puts it in any file it saves as Unicode.
"read", unfortunately, is almost completely clueless when it comes to
Unicode. You can say "read ... as Unicode text", but you then can't use
the "before", "until", or "using delimiters" parameters or it'll screw
up again. The simplest thing would be to save the file with a different
encoding -- if you're really worried about unusual characters, UTF-8 is
your best best; otherwise I'd suggest Mac Roman. Alternatively, you
could suck in the entire file as Unicode and then tear it apart in the
script. For instance, this works:
set rawdata to read f as Unicode text
set text item delimiters to {ASCII character 10}
set pairs to every text item of rawdata
set text item delimiters to {"="}
repeat with i in pairs
set contents of i to text items of contents of i
end repeat
It actually produces a list of two-item lists, not the interleaved list
you had originally. Also, notice that I'm using "ASCII character 10"
and t.i.d. to get each line instead of "paragraphs" -- perversely,
Unicode text understands \r as a paragraph break, but *not* \n or \r\n,
while string understands all three. Appropriate bugs have been filed.
--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.