Re: How to detect string encoding before reading a file in NSString?
Re: How to detect string encoding before reading a file in NSString?
- Subject: Re: How to detect string encoding before reading a file in NSString?
- From: John Pannell <email@hidden>
- Date: Tue, 26 Apr 2011 13:39:35 -0600
Hi Laurent-
I have an app that collects a lot of text off the web; my string creation algorithm is something like the following:
1. Attempt to create an NSString with NSUTF8StringEncoding.
2. If the string is nil, attempt to create the string using the encoding returned from the server.
3. If string is still nil, ask the Text Encoding Conversion Manager to sniff out the encoding from the data.
3a. This returns an array of likely encodings. For each item in the array:
3b. Attempt to create a string with the encoding.
There was a little too much code associated with this to copy/paste into email, but I'd be happy to share... I have a wrapper object for the needed interaction with the Text Encoding Conversion Manager. Some more about it:
http://developer.apple.com/library/mac/#documentation/Carbon/Reference/Text_Encodin_sion_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000123
Hope this helps!
John
John Pannell
http://www.positivespinmedia.com
On Apr 26, 2011, at 12:53 PM, Nick Zitzmann wrote:
>
> On Apr 26, 2011, at 12:49 PM, Laurent Daudelin wrote:
>
>>> TextEdit's encoding guesser just uses the built-in NSAttributedString method -initWithURL:options:documentAttributes:error:, which will guess the file's encoding when opening it. But it has been mentioned that heuristics are not infallible, and this method's heuristics are no exception. It does a good job overall, but I've found that it usually misinterprets UTF-8 format text.
>>
>> Yes, I know that all the guess jobs can fail. I was starting to be excited when started reading your reply but if it usually misinterprets UTF-8, that's a pretty significant problem...
>
> That was a long time ago, so it may have been fixed. But if it's still happening, then one workaround would be to try and open the file as UTF-8 first, and if that fails, then fall back on the above method. The UTF-8 parser often returns nil on text that is not in UTF-8 format IIRC.
>
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden