Re: wchar_t* turned into a char*?
Re: wchar_t* turned into a char*?
- Subject: Re: wchar_t* turned into a char*?
- From: Mark Morrill <email@hidden>
- Date: Tue, 27 Jul 2004 17:59:21 -0600
Yep. I know wchar_t is implementation defined. Unix world general takes
it as 32 bits. Windows takes it as 16 bits. Carbon/Foundation is using
16 bits.
I've done a but more poking about. I'm using Xcode on Mac OS X 10.3.4,
and pretty sure I'm using gcc 3.3.
When the compiler comes across the following (I'm using UTF-8 encoded
source).
wchar_t* str = L"三個子"; // <- 3 characters
It makes a string of ten 32 bit ints (null terminated). Each char of
the UTF-8 encoded string, which is 9 bytes plus a null byte gets
extended into the full 32 bits.
三 0xe4 0xb8 0x89 -> 0x000000e4 0x000000b8 0x00000089
個 0xe5 0x80 0x8b -> 0x000000e5 0x00000080 0x0000008b
子 0xe5 0xad 0x90 -> 0x000000e5 0x000000ad 0x00000090
This method of character encoding is quite unfamiliar to me. Is this
what I should be expecting from gcc? This certainly doesn't seem like
Unicode to me...
Mark
On Jul 27, 2004, at 1:26 PM, Allan Odgaard wrote:
> On 27. Jul 2004, at 4:05, Mark Morrill wrote:
>
>> The surprise comes in when I try to dump wstr as unsigned shorts.
>> "NR[Dc"
>> is stored as UTF-8! [...]
>
> Two problems here 1) wchar_t is implementation defined (so don't
> expect ucs-4) and 2) gcc does not understand any source file encoding,
> so if you write L"(something)", it will literally take every octet in
> the source file, extend it to wchar_t and put it in the string.
_______________________________________________
xcode-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/xcode-users
Do not post admin requests to the list. They will be ignored.