And the BOM would provide a handy way to tell a UTF8 from an ASCII.
It's rather unfortunate that the UTF8 BOM was not widely adopted,
because reading a UTF8 as ASCII is a bad experience which happens
rather frequently, I think.
that's the way the so called standards are going in "THE real lufe" )))
adding a bom is questionnable because, fundamentaly you are writing
something to recognize the encoding in a file of unknown encoding....
in my opinion it would have been better writing that outsude of the file
contents in the name of the file for ex part of the extension.
and standardize extension (as it is effectively) having a name made of
then you don't have even to open the file to know the encoding.
i'll read carefully the UTF-16 example at satimage because i've an app
where i have to guess encoding.
it's amazing, it is a really simple app :
- it takes one html folder and write a menu to each file in such a way
that, afterwards the user could navigate between those files in this folder.
simple task no ?
right except you face up directly with the "encoding" prob because all
of the resulting file have to be of the same encoding, then you have to
transcode each from it's "suppose to be" encoding to a unic one etc....
i had then to guess all sort of encoding in HTML files where their is a
lot of pure ascii (the tags) the only way to get correct guessing is
guessing the language used from a reference text encoded in various
encoding, then retrieving the language of the text let you retrieve the