Re: Xcode-users Digest, Vol 7, Issue 153
Re: Xcode-users Digest, Vol 7, Issue 153
- Subject: Re: Xcode-users Digest, Vol 7, Issue 153
- From: Lehel Bernadt <email@hidden>
- Date: Wed, 14 Apr 2010 11:20:30 +0200
Well here is my take on this issue:
The type wchar_t is an internal multibyte character representation for the
compiler that is optimised for string operations on a specific platform. It
can be of any size, and it doesn't need to be mapped to any existing encoding.
So a wchar_t array is not a "string" in the conventional sense.
Basically the concept is that at runtime, you need to do
(string with a specific encoding) <---> wchar_t representation
conversions back and forth whenever you want to use string ops on your real
strings. Also it's not wise to store wchar_t string literals in your
program unless they're ASCII strings. Store it in UTF-8 or whatever
encoding you like (that doesn't use null chars) in a C string, then at
runtime convert it to wchar_t, as all the other strings you get as input,
do your string operations, and then when you need to display it, convert it
to a char* string encoded according to the locale.
All of this means that there's a sharp divide between real-world strings
using whatever encoding and wchar_t strings.
The positive side of this is that there is no limitation for the
compiler... if implemented right, it should be no problem for example
changing the size to 64 bit for a 64 bit platform, if string comparisons
would be faster that way, or defining it as 8 bit for embedded systems.
It's not a problem if UTF-16 is succeeded by UTF-32 or UTF-64 or whatever,
since we are encoding agnostic.
The negative side is that you need to convert *every single time*. For
example you cannot do this:
if(wc == L'ő') ... in an UTF-8 encoded source
First you need to convert 'ő' to wchar_t, and only then can you compare...
which is pretty cumbersome. Because C doesn't have encapsulation and the
possibility of using syntactic sugar to simplify all these ops like in the
case of OO languages, it's also not entirely obvious why you can do c = 'a'
with a char but not wc = 'ű' with a wchar_t.
As you can see TCHAR is "on the other side of the fence" compared to
wchar_t, because it represents real-life strings using a specific encoding.
There is no support for this type under POSIX, only null terminated C
strings (which includes the possibility of using UTF-8), and using wchar_t
during runtime.
Regards,
Lehel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
References: | |
| >Re: Xcode-users Digest, Vol 7, Issue 153 (From: "email@hidden" <email@hidden>) |
| >Re: Xcode-users Digest, Vol 7, Issue 153 (From: "Clark S. Cox III" <email@hidden>) |
| >Re: Xcode-users Digest, Vol 7, Issue 153 (From: "Paul Sanders" <email@hidden>) |
| >Re: Xcode-users Digest, Vol 7, Issue 153 (From: Clark Cox <email@hidden>) |
| >Re: Xcode-users Digest, Vol 7, Issue 153 (From: "Paul Sanders" <email@hidden>) |