Re: Converting unicode strings to its values
Re: Converting unicode strings to its values
- Subject: Re: Converting unicode strings to its values
- From: Ken Thomases <email@hidden>
- Date: Wed, 11 Mar 2009 05:00:12 -0500
On Mar 11, 2009, at 3:23 AM, Christian Netthöfel wrote:
I'm quite new to Cocoa-development and have a question regarding the
conversion of unicode-representations. I have a string containing
the unicode values for umlauts or other special characters as
strings. What I want to do is to convert them to their real unicode
value so that the characters are displayed.
Example:
"\u00d6" should become "ö"
If a string literal in your source code actually contains the \uNNNN
escape sequence, then that will be translated by the compiler. The
proper Unicode will end up in your binary. (This wasn't true with
earlier versions of gcc, but is true of the gcc that comes with Xcode
3.x.) I suspect you know that.
If you actually get a string which contains the backslash, lowercase
'u', and hex digits, then you're going to have to parse that somewhat
manually. NSScanner will probably be helpful. Once you get an
integer from the hex digits, you can use NSString's character-based
methods like + stringWithCharacters:length: or -
initWithCharacters:length:. Sadly, I don't see a method on
NSMutableString for inserting or appending individual character codes
into a string.
Be mindful that not every UTF-16 code point is valid Unicode in
isolation. Combining characters and surrogate pairs complicate
things. I don't know if the character-based methods validate the
character sequence you provide to them. They might. So, you might
need to convert an entire string at once, rather than attempting to
just create an NSString from a single unichar derived from a \uNNNN
escape sequence. That single unichar may only be valid in combination
with the other characters around it.
Do you know the encoding of the rest of the string you're dealing
with? Is it guaranteed to be ASCII? Might it be UTF-8 which also
includes some escaped Unicode characters for some reason?
Regards,
Ken
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden