Re: Is there any support in Cocoa for stupidly encoded UTF-8 string?
Re: Is there any support in Cocoa for stupidly encoded UTF-8 string?
- Subject: Re: Is there any support in Cocoa for stupidly encoded UTF-8 string?
- From: Clark Cox <email@hidden>
- Date: Thu, 20 Jan 2005 15:15:04 -0500
On Thu, 20 Jan 2005 11:42:00 -0800, Andrew Farmer <email@hidden> wrote:
> On 20 Jan 2005, at 09:26, Stephane Sudre wrote:
> > In some e-mail subjects, people are using what is supposed to be UTF-8
> > encoded and is actually poor Unicode encoded.
> >
> > For instance, instead of 0xC3A9 for eacute, you end up with 0xE9
> > (where it should be 0x00E9).
> >
> > When you use NSString initWithBytes:length:encoding with the UTF-8
> > encoding as the paramter, you obtain nil. I understand this.
> >
> > Now, the question is: is there a method in Cocoa to deal with stupidly
> > encoded UTF-8 string?
>
> What you're looking at is ISO8859-1 encoded text. Decode it as such and
> you'll be fine.
>
> I'm pretty sure that there *should* be some easy way to detect whether
> text in the subject is encoded with ISO8859-1 or UTF-8. Look up the
> standards (if they exist).
While you can make some educated guesses, there is no foolproof way to
conclusively determine if text is UTF-8 vs. ISO-8859-1. The best guess
that you can make is already made by NSString for you: It couldn't
convert the text and returned nil.
When you receive that nil, you have to guess at the format. If your
data is likely to be coming from Western Europeans or Americans, then
ISO-88599-1 is probably a good backup guess.
--
Clark S. Cox III
email@hidden
http://www.livejournal.com/users/clarkcox3/
http://homepage.mac.com/clarkcox3/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden