Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Reading a File (Was: Re: TextEdit and Text Item Delimiters)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading a File (Was: Re: TextEdit and Text Item Delimiters)

Subject: Re: Reading a File (Was: Re: TextEdit and Text Item Delimiters)
From: Christopher Nebel <email@hidden>
Date: Wed, 31 Jul 2002 19:08:39 -0700

On Wednesday, July 31, 2002, at 05:17 PM, email@hidden wrote:

I get ALMOST what I expect, but it has two garbage characters at the very
beginning (what looks like a backwards comma, and a flipped-upside-down
carrot). I could do:

Sounds like you saved the file as Unicode. If you say "characters of item 1 of pairs", you should see apparently empty strings between every real character. (They actually contain a null character which doesn't display as anything.)

The "garbage" at the front is a magic Unicode character which both identifies the contents as Unicode and tells you whether the data is big- or little-endian. (If you were moving files between, say, Mac and Windows systems it would make a difference.) It's not technically required (I think), but TextEdit puts it in any file it saves as Unicode.

"read", unfortunately, is almost completely clueless when it comes to Unicode. You can say "read ... as Unicode text", but you then can't use the "before", "until", or "using delimiters" parameters or it'll screw up again. The simplest thing would be to save the file with a different encoding -- if you're really worried about unusual characters, UTF-8 is your best best; otherwise I'd suggest Mac Roman. Alternatively, you could suck in the entire file as Unicode and then tear it apart in the script. For instance, this works:

set rawdata to read f as Unicode text
set text item delimiters to {ASCII character 10}
set pairs to every text item of rawdata
set text item delimiters to {"="}
repeat with i in pairs
set contents of i to text items of contents of i
end repeat

It actually produces a list of two-item lists, not the interleaved list you had originally. Also, notice that I'm using "ASCII character 10" and t.i.d. to get each line instead of "paragraphs" -- perversely, Unicode text understands \r as a paragraph break, but *not* \n or \r\n, while string understands all three. Appropriate bugs have been filed.

--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References:
	>Reading a File (Was: Re: TextEdit and Text Item Delimiters) (From: email@hidden)

Prev by Date: Re: Mac OS X Finder views
Next by Date: Re: Display dialog + compiled scripts
Previous by thread: Reading a File (Was: Re: TextEdit and Text Item Delimiters)
Next by thread: breaking on newline
Index(es):
- Date
- Thread