Hi all,
I've been doing some experimenting with Unicode when I found something
surprising. I'm not sure if it is how it's suppose to be or if it's a
bug...
At the end of this email is a short bit of source code that tries to
create a UTF-16 (actually UCS-2, I suppose) text file.
The marker (0xFE, 0xFF) writes out to the file as it ought.
The array 'str' writes out to the file as it ought.
No surprises so far.
The surprise comes in when I try to dump wstr as unsigned shorts.
"NR[Dc"
is stored as UTF-8! Moreover, wstr is not 4 wchar_t (3 plus 1 null)
but a null terminated string of 9 chars. When I do the loop, for( i=0;
wstr[i]; i++ ), it loops 9 times! I declare wstr as wchar_t* but, I
assume, the compiler is smarter than I and decides that wstr is really
char*.
Is this how it is suppose to behave? I was expecting that wstr would
have been UCS-4 since wchar_t is 4 bytes. The source code is UTF-8
because the compiler chokes on UTF-16 (perhaps I was doing something
wrong?) What I was expecting was that the compiler would have converted
the text into UCS-4 for wchar_t or give an error saying that the text
was UTF-8 and not compatible with UCS-4.
The last thing I expected was to have my wchar_t* turned into a char*.
Mark
--- source ---
int main( int argc, char * const argv[] )
{
const unsigned char marker[] = { 0xFE, 0xFF };
const wchar_t str[] =
{
'T', 'e', 's', 't', 0
};
wchar_t* wstr = L"NR[Dc";
std::cout << "Welcome to Test!\n";
FILE* out = ::fopen( "Test.out", "wt+" );
::fwrite( marker, sizeof( char ), 2, out );
int i;
unsigned short ch;
for( i=0; wstr[i]; i++ )
{
ch = (unsigned short) wstr[i];
printf( "%x = %x\n", ch, wstr[i] );
::fwrite( &ch, sizeof( char ), 2, out );
}
fprintf( out, "\n[%S] - [%S]\n", str, wstr );
for( i=0; str[i]; i++ )
{
ch = (unsigned short) str[i];
printf( "%x = %x\n", ch, str[i] );
::fwrite( &ch, sizeof( char ), 2, out );
}
return 0;
}
--- output to Test.out ---
0000: FE FF 00 E6 00 88 00 91 00 E6 00 84 00 9B 00 E4 ................
0010: 00 BD 00 A0 0A 5B 54 65 73 74 5D 20 2D 20 5B E6 .....[Test] - [.
0020: 88 91 E6 84 9B E4 BD A0 5D 0A 00 54 00 65 00 73 ........]..T.e.s
0030: 00 74 .t
_______________________________________________
xcode-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/xcode-users
Do not post admin requests to the list. They will be ignored.