Re: UTF8 question.
Re: UTF8 question.
- Subject: Re: UTF8 question.
- From: Andrew Thompson <email@hidden>
- Date: Wed, 24 Aug 2005 22:15:44 -0400
On Aug 24, 2005, at 5:52 AM, Chris Ridd wrote:
wide char values
Win: C0 30 (little endian)
OK, you're encoding U+30C0 (KATAKANA LETTER DA)
Mac: 30 BF 30 99 (big endian)
This becomes KATAKANA LETTER TA + COMBINING KATAKANA-HIRAGANA VOICED
SOUND
MARK.
In other words, the conversion is splitting the original character
into two
decomposed (is that the right term?) pieces.
And that's a valid decomposition. The katakana syllable ta (~{%?~}) can
indeed be combined with a mark to make da~{!!#(%@#)~}.
If you're not familiar with Japanese, perhaps a French example will be
more understandable.
In Unicode, an e with an acute accent (~{(&~}) can be represented as a
single character, e-acute, or as a plain e followed by a combining
acute.
So I'm not sure I see a problem - this is two different valid ways of
representing the same glyph.
See here: http://www.unicode.org/faq/char_combmark.html
I believe there's an algorithm to get from one form to the other
(decomposed form or not).
So there's probably an API call on at least one of the platforms that
can do the conversion.
As others have said - please be careful - UTF-8 is not the only
commonly encountered Unicode encoding, so please also double check that
the functions you're calling really do give UTF-8 and not UCS-2, UCS-4
or UTF-16.
AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside
(see you later space cowboy ...)
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden