First, my apologies for not editing the subject line in my last post. I have reverted back to the original title.
Knowing now how unportable and vaguely specified wchar_t is, I realise that we should have avoided it all those years ago when we did our Unicode Windows codebase. Things are always clearer in hindsight!
I don't like the idea of just treating TCHAR as char* and using UTF8 strings. It will work in many cases such as the simple printf show below, but I have past experience of working with multibyte strings and it is an absolute nightmare. Certain of the C standard string functions just don't work reliably. I know we have code like this (I'm removing our TCHAR-esque typedefs for simplicity):
char *pos = strrchr(path, '/');
and that won't respect multibyte characters. Maybe C lib functions like that can be made to work with a suitable locale setting, but in a large team someone is always going to write code that iterates through a string character by character using a "char *p = <string>; while (*p++) { ... }" idiom and that's going to go wrong with UTF8 encoding.
Also -- in an effort to be cross-platform -- we use the STL and Boost string algorithms library quite a lot:
std::string str("Hello world"); if (boost::algorithm::istarts_with(str, "Hello)) { ... }
and I'm pretty sure the underlying Boost::Range mechanics require that you can iterate a string solely by knowing the size of its character type so would probably go wrong with UFT8 in strings. (I'm not 100% certain of that, and I know this is not the place for a Boost discussion.)
Anyway, I understand my choices now. I have filed a bugreporter issue but even if it gets addressed it's not going to be a solution for my current project of course.
Thanks.
Ben Staveley-Taylor
Message: 4 Date: Tue, 13 Apr 2010 15:19:49 -0700 From: Clark Cox <email@hidden> Subject: Re: Xcode-users Digest, Vol 7, Issue 153
As specified by the C and C++ standards, wchar_t is largely useless. I
would recommend avoiding it at all costs.
If, as I have, you replace what I consider to be a broken swprintf
implementation with something that treats wchar_t transparently, wchar_t
works just fine, and gives me (and the OP) what we seek - the ability to
re-use our existing TCHAR-based Windows code on the Mac.
If the goal is to re-use the TCHAR-based code on the Mac, just define TCHAR as a no op, and use plain char. The compilers on the Mac support UTF-8 string literals. That is, the following code is fine on the Mac and works as expected:
[ccox@ccox-macpro:~]% cat test.m #import <Foundation/Foundation.h>
int main() { const char *cstring = "My name is Clark. 秙㙮å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃe Clark."; NSString *nsstring = @"My name is Clark. 秙㙮å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃe Clark.";
printf("%s\n%s\n", cstring, [nsstring UTF8String]); return 0; } [ccox@ccox-macpro:~]% cc test.m -framework Foundation -fobjc-gc && ./a.out My name is Clark. 秙㙮å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃe Clark. My name is Clark. 秙㙮å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃe Clark.
-- Clark S. Cox III email@hidden
|