Re: Reading Middle Eastern Characters
Re: Reading Middle Eastern Characters
- Subject: Re: Reading Middle Eastern Characters
- From: Christopher Nebel <email@hidden>
- Date: Mon, 6 Dec 2004 23:11:29 -0800
On Dec 6, 2004, at 10:30 PM, Ferenc Farkas MÁTYÁS wrote:
and European users), not UTF-8. If you say "read ... as <<class
utf8>>", you'll get the right result, but see the next bit. (The <<
and >> are chevrons, which won't go through the mailing list
correctly. Ritual cursing of the list here.)
How can one determine if the text is in utf or macroman or in another
encoding? The reason I am asking this is if the text is utf8, it reads
it well, but if it's not, I get en empty output from read. I can test
it, if it's empty or not, but it's an ugly workaround I think.
Believe it or not, that's about the best you can do, and is actually
what's recommended in many cases. Because of how UTF-8 is structured,
it's unlikely you'll have data that looks like UTF-8 but isn't, so the
usual technique is to try to interpret the data as UTF-8 first; if that
fails, then fall back to some other encoding, usually the
system-primary one. Doing this by reading the file twice isn't
particularly efficient, but given AppleScript's facilities for this
sort of thing, you're a bit stuck for it.
Correctly determining which of the dozens of conceivable text encodings
a given hunk of data uses is essentially an AI-complete problem -- that
is, you need human-level intelligence, and even most humans would have
a hard time with some of the fringe cases.
If you're in a position to dictate the encodings you'll handle, then by
all means do so. Many folks will automatically handle UTF-16, since a
UTF-16 data file will always start with 0xfeff (or 0xfffe for
byte-swapped UTF-16), and punt on everything else.
--Chris Nebel
AppleScript Engineering _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden