And the BOM would provide a handy way to tell a UTF8 from an ASCII.
It's rather unfortunate that the UTF8 BOM was not widely adopted,
because reading a UTF8 as ASCII is a bad experience which happens
rather frequently, I think.
Reading UTF-8 as ASCII, when properly done, gets you a lot of question
marks. Reading it as an 8-bit encoding (Latin-1 or Latin-9 or
Windows-1252 or MacRoman or...) is a lot more troublesome since
there's no such thing as an invalid byte value in those encodings. So
every byte is interpreted as some character, and you get a lot more
For instance, the « and » and ¬ characters probably show up in
AppleScript, and therefore on this mailing list, more than any other
non-ASCII chars Assuming this message is going out in UTF-8 . .
well, here, let me just make certain of that: ☺
... those characters are each encoded with a two-byte sequence. « is
(194,171), » is (194,187), and ¬ is (194,172). They happen to fall in
the range where the UTF-8 consists of a single extra character in
front of the correct Latin-1 byte value, so if your mail client
assumes Latin-1 (hi, Yahoo!) you just get an Â in front of the desired
character - ugly but still readable. If, instead, it interprets them
as MacRoman they come out as a ¬ in front of ´, ª, and ¨,
respectively, with no visible clue to their actual identity...
Mark J. Reed <email@hidden>
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden