• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Converting ASCII to UTF-8?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Converting ASCII to UTF-8?


  • Subject: Re: Converting ASCII to UTF-8?
  • From: "b.bum" <email@hidden>
  • Date: Mon, 29 Mar 2004 17:49:44 -0800

On Mar 29, 2004, at 1:57 PM, Huyler, Christopher M wrote:
If I read an ASCII string in from a file and store it in an NSString,
how could I convert it to UTF-8 so it will display properly?

As others said... if it is truly ASCII, there isn't an encoding issue.

But I'm guessing-- as did others-- that you are in a situation where the string isn't actually ASCII, but is one of those weird 8-bit variants of ASCII where certain accents, etc, are stored as single byte characters.

I guess at this because I have run into this about a dozen times in the last year in about as many different contexts and/or languages. In a number of the cases, it was because I was tangling with an XML document that claimed to be encoded as UTF-8, but was lying -- was using a single byte to encode characters like i, g, and |.

If that is the case, then you effectively need to decode the single byte characters into two bytes. It is likely that NSNEXTSTEPStringEncoding, NSISOLatin1StringEncoding, or NSMacOSRomanStringEncoding will "just work" as both were created before UTF-8 was common (IIRC).

Try the [[NSString alloc] initWithData: [NSData dataWithContentsOfFile: ...] encoding: NSNEXTSTEPStringEncoding, NSISOLatin1StringEncoding or NSMacOSRomanStringEncoding] code path first.

If that works, great... if not, then you'll probably need to massage the bytes by hand. Or maybe you want to do the massaging by hand.

I did. It isn't hard. This is the python snippet I used (efficiency was completely NOT an issue -- hence the one-character-at-a-time-ultra-simple-approach):

import sys
while 1:
x = sys.stdin.read(1)
if x == "": break
if ord(x) < 0x80:
sys.stdout.write(x)
else:
sys.stdout.write( chr( 0xC0 | (ord(x) >> 6)))
sys.stdout.write( chr( 0x80 | (ord(x) & 0x3f)))

Further reading -- this is a tinyurl to a google cache entry for a very useful page where the original site seems to be down:

http://tinyurl.com/3x4dd

b.bum
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.


References: 
 >Converting ASCII to UTF-8? (From: "Huyler, Christopher M" <email@hidden>)

  • Prev by Date: Re: Converting ASCII to UTF-8?
  • Next by Date: Re: Please help - Re: Urgent Help - How to update HTTP/HTTPS proxy settings programmitically using SystemConfiguration framework??
  • Previous by thread: Re: Converting ASCII to UTF-8?
  • Next by thread: RE: Converting ASCII to UTF-8?
  • Index(es):
    • Date
    • Thread