Re: Conversion of =E? string
Re: Conversion of =E? string
- Subject: Re: Conversion of =E? string
- From: "Tim Buchheim" <email@hidden>
- Date: Thu, 28 Sep 2006 11:58:16 -0700
On 9/28/06, malcom <email@hidden> wrote:
Hello I've a string like
"bla bla =E8 bla" to "bla bla รจ bla"
is this UTF-8? How is possible to convert automatically these strings?
No, this is not UTF-8. It's the quoted-printable, which is commonly
used for email.
See http://en.wikipedia.org/wiki/Quoted-printable for a quick
overview, and see the MIME spec (specifically RFC 2045) for details.
You can tell it's not UTF-8 because in UTF-8 all bytes with the high
bit clear represent 7-bit characters (the part of Unicode which
corresponds to the US ASCII character set). In UTF-8 a multi-byte
sequence is composed of characters which have the high-order bit set
to 1.
Decoding quoted-printable is fairly simple.. the two characters after
the equals sign indicate the hexadecimal value of the character to
substitute in place of the =XX sequence.
Note that the resulting string may still require conversion, as it
could be in any character set. (In this case it looks like it is
likely ISO Latin 1.)
If you're dealing with email, you should be examining the content-type
header to determine the character set to use, and the
content-transfer-encoding header to determine how it is encoded.
(SMTP is not always 8-bit clean, so encodings such as quoted-printable
or base64 are used to convert 8-bit data to 7-bit data.)
--
Tim Buchheim
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden