Re: Converting ASCII to UTF-8?
Re: Converting ASCII to UTF-8?
- Subject: Re: Converting ASCII to UTF-8?
- From: "b.bum" <email@hidden>
- Date: Mon, 29 Mar 2004 17:49:44 -0800
On Mar 29, 2004, at 1:57 PM, Huyler, Christopher M wrote:
If I read an ASCII string in from a file and store it in an NSString,
how could I convert it to UTF-8 so it will display properly?
As others said... if it is truly ASCII, there isn't an encoding issue.
But I'm guessing-- as did others-- that you are in a situation where
the string isn't actually ASCII, but is one of those weird 8-bit
variants of ASCII where certain accents, etc, are stored as single byte
characters.
I guess at this because I have run into this about a dozen times in the
last year in about as many different contexts and/or languages. In a
number of the cases, it was because I was tangling with an XML document
that claimed to be encoded as UTF-8, but was lying -- was using a
single byte to encode characters like i, g, and |.
If that is the case, then you effectively need to decode the single
byte characters into two bytes. It is likely that
NSNEXTSTEPStringEncoding, NSISOLatin1StringEncoding, or
NSMacOSRomanStringEncoding will "just work" as both were created before
UTF-8 was common (IIRC).
Try the [[NSString alloc] initWithData: [NSData dataWithContentsOfFile:
...] encoding: NSNEXTSTEPStringEncoding, NSISOLatin1StringEncoding or
NSMacOSRomanStringEncoding] code path first.
If that works, great... if not, then you'll probably need to massage
the bytes by hand. Or maybe you want to do the massaging by hand.
I did. It isn't hard. This is the python snippet I used (efficiency
was completely NOT an issue -- hence the
one-character-at-a-time-ultra-simple-approach):
import sys
while 1:
x = sys.stdin.read(1)
if x == "": break
if ord(x) < 0x80:
sys.stdout.write(x)
else:
sys.stdout.write( chr( 0xC0 | (ord(x) >> 6)))
sys.stdout.write( chr( 0x80 | (ord(x) & 0x3f)))
Further reading -- this is a tinyurl to a google cache entry for a very
useful page where the original site seems to be down:
http://tinyurl.com/3x4dd
b.bum
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.