Re: Writing to file as UTF8 with BOM ?
Re: Writing to file as UTF8 with BOM ?
- Subject: Re: Writing to file as UTF8 with BOM ?
- From: "Mark J. Reed" <email@hidden>
- Date: Thu, 26 Oct 2006 17:26:35 -0400
On 10/26/06, Emmanuel <email@hidden> wrote:
And the BOM would provide a handy way to tell a UTF8 from an ASCII.
It's rather unfortunate that the UTF8 BOM was not widely adopted,
because reading a UTF8 as ASCII is a bad experience which happens
rather frequently, I think.
Reading UTF-8 as ASCII, when properly done, gets you a lot of question
marks. Reading it as an 8-bit encoding (Latin-1 or Latin-9 or
Windows-1252 or MacRoman or...) is a lot more troublesome since
there's no such thing as an invalid byte value in those encodings. So
every byte is interpreted as some character, and you get a lot more
gobbledygook.
For instance, the « and » and ¬ characters probably show up in
AppleScript, and therefore on this mailing list, more than any other
non-ASCII chars Assuming this message is going out in UTF-8 . .
well, here, let me just make certain of that: ☺
... those characters are each encoded with a two-byte sequence. « is
(194,171), » is (194,187), and ¬ is (194,172). They happen to fall in
the range where the UTF-8 consists of a single extra character in
front of the correct Latin-1 byte value, so if your mail client
assumes Latin-1 (hi, Yahoo!) you just get an  in front of the desired
character - ugly but still readable. If, instead, it interprets them
as MacRoman they come out as a ¬ in front of ´, ª, and ¨,
respectively, with no visible clue to their actual identity...
--
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users
This email sent to email@hidden