Re: Tao of string encodings (Re: Converting ASCII to UTF-8?)
Re: Tao of string encodings (Re: Converting ASCII to UTF-8?)
- Subject: Re: Tao of string encodings (Re: Converting ASCII to UTF-8?)
- From: Ralph Pöllath <email@hidden>
- Date: Wed, 31 Mar 2004 16:24:35 +0200
On 31.03.2004, at 15:58, Andrew Thompson wrote:
On Mar 30, 2004, at 5:37 PM, Marco Scheurer wrote:
On Mar 30, 2004, at 11:49 PM, Jim Rankin wrote:
On Mar 29, 2004, at 7:57 PM, Shawn Erickson wrote:
You just need to read things in correctly using the correct encoding
(an encoding matching the source file).
Here's the one piece of the whole encoding puzzle that I've never
been
able to figure out.
Your program is handled a file path or url that's ostensibly a text
file. From that point, how do you know the encoding of what you've
just been handed?
Probably missing something obvious...
No, you just can't without more information. That's why in TextEdit
you can manually specify the encoding to use to open a file. That
being said there are heuristics to guess a text encoding.
In one program I've simply used trial and error, using NSString's
-initWithData:encoding: (which returns nil if it fails to create a
valid string with the supplied data) and a small set of likely
encodings with good results.
Or google for sniffer and encoding.
Mozilla has a fairly good built in sniffer (View->Character
Coding->Auto-detect->Universal).
It took them a while to develop and of course its not perfect. There's
the problem that it assumes HTTP where you may get a char encoding
header to go one. Of course, that's a mixed blessing because many
servers are mis-configured and serve the wrong one!
http://developer.apple.com/documentation/Carbon/Reference/
Text_Encodin_sion_Manager/tec_refchap/function_group_11.html
looks like Carbon supports encoding sniffers.
Cheers,
-Ralph.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.