Re: Writing to file as UTF8 with BOM ?
Re: Writing to file as UTF8 with BOM ?
- Subject: Re: Writing to file as UTF8 with BOM ?
- From: Christopher Nebel <email@hidden>
- Date: Thu, 26 Oct 2006 12:54:56 -0700
On Oct 26, 2006, at 8:17 AM, Yvon Thoraval wrote:
Mark J. Reed wrote:
UTF-8, on the other hand, is, as the name implies, an 8-bit encoding.
It's defined in terms of bytes, not 16-bit words, so the order of
those bytes is fixed. You don't need a BOM to distinguish between
some hypothetical *UTF-8LE and *UTF-8BE encodings. But it still
makes
sense to put a BOM in a UTF-8 file to identify that file as not only
Unicode text, but specifically as UTF-8 text. In fact, the UTF-8
version of the BOM, since it's 3 bytes instead of 2, is 256 times
less
likely than the UTF-16 BOM to appear randomly in data. It's
therefore
even closer to a guarantee that the file has UTF-8 text instead of
something else.
i thought UTF-8 could be guessed (successfully) from the content of
the file isn't it ?
The key word there is "guessed". Without a BOM, you can't tell
whether or not the file is UTF-8 without examining the entire
contents, which is inefficient and may even be impossible in some
situations. (Strictly speaking, a leading BOM isn't proof either,
since it could just be there as random data, but as Mr. Reed points
out, it's pretty unlikely. That was an excellent explanation, Mark;
thank you.)
To sum up: yes, UTF-8 BOMs aren't commonly used, because one of the
main uses of UTF-8 is as a format that will work (at least some) with
completely Unicode-ignorant applications like, say, grep(1). However,
it's still useful to Unicode-aware protocols because it can serve as a
signature that the following data is UTF-8, as opposed to some sort of
legacy encoding. It's up to the protocol definition whether or not it
wants to insist on having a BOM, and there is nothing necessarily
wrong with any of the possible choices. For further reading, I
recommend <http://www.unicode.org/faq/utf_bom.html>.
--Chris Nebel
AppleScript Engineering
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users
This email sent to email@hidden