Re: String Encoding Detection (Revisited)
Re: String Encoding Detection (Revisited)
- Subject: Re: String Encoding Detection (Revisited)
- From: Dustin Voss <email@hidden>
- Date: Thu, 7 Aug 2003 10:11:41 -0700
On Thursday, August 7, 2003, at 01:44 AM, Francisco Tolmasky wrote:
Ok, so I recently posted a question about auto-detecting string
encodings, and also looked through the archives. Basically there's no
way unless it is unicode and has a BOM. I still want an auto-detect
feature though, like BBEdit's. So basically, how do I check for a BOM
(I check TextEdit's code, couldn't find it, found lots of other stuff
though). Anyways, other than that and doing some weird spell checking
thing someone suggested (Using spellchecker to see if the string makes
sense or not, which would be pretty useless if it's code or anything
other than pure sentences), are there any other "tricks"?
And when all else fails and I resort to just using an encoding, which
one should I choose mac os roman, ascii, utf-8?
I don't know about tricks, but the BOM will be one of the following:
UTF-16 BE: FE FF
UTF-16 LE: FF FE
UTF-8: EF BB BF
You could using Carbon's Text Encoding Converter. It supports something
called a "sniffer" that analyzes text and tries to determine the
likeliest encoding. It looks pretty powerful.
Conceptual information is at
<
http://developer.apple.com/documentation/Carbon/Conceptual/
ProgWithTECM/index.html>, but it does not discuss sniffers.
The API reference is at
<
http://developer.apple.com/documentation/Carbon/Reference/
Text_Encodin_sion_Manager/index.html>.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.