Re: swprintf fails with extended character codes

Subject: Re: swprintf fails with extended character codes
From: Ben Staveley-Taylor <email@hidden>
Date: Wed, 14 Apr 2010 10:02:41 +0100

First, my apologies for not editing the subject line in my last post. I have reverted back to the original title.

Knowing now how unportable and vaguely specified wchar_t is, I realise that we should have avoided it all those years ago when we did our Unicode Windows codebase. Things are always clearer in hindsight!

I don't like the idea of just treating TCHAR as char* and using UTF8 strings. It will work in many cases such as the simple printf show below, but I have past experience of working with multibyte strings and it is an absolute nightmare. Certain of the C standard string functions just don't work reliably. I know we have code like this (I'm removing our TCHAR-esque typedefs for simplicity):

char *pos = strrchr(path, '/');

and that won't respect multibyte characters. Maybe C lib functions like that can be made to work with a suitable locale setting, but in a large team someone is always going to write code that iterates through a string character by character using a "char *p = <string>; while (*p++) { ... }" idiom and that's going to go wrong with UTF8 encoding.

Also -- in an effort to be cross-platform -- we use the STL and Boost string algorithms library quite a lot:

std::string str("Hello world");

if (boost::algorithm::istarts_with(str, "Hello)) { ... }

and I'm pretty sure the underlying Boost::Range mechanics require that you can iterate a string solely by knowing the size of its character type so would probably go wrong with UFT8 in strings. (I'm not 100% certain of that, and I know this is not the place for a Boost discussion.)

Anyway, I understand my choices now. I have filed a bugreporter issue but even if it gets addressed it's not going to be a solution for my current project of course.

Thanks.

Ben Staveley-Taylor

On 14 Apr 2010, at 06:19, email@hidden wrote:

Message: 4
Date: Tue, 13 Apr 2010 15:19:49 -0700
From: Clark Cox <email@hidden>
Subject: Re: Xcode-users Digest, Vol 7, Issue 153

As specified by the C and C++ standards, wchar_t is largely useless. I
would recommend avoiding it at all costs.

If, as I have,Â you replace what I consider to be a broken swprintf
implementation with something that treats wchar_t transparently, wchar_t
works just fine, and gives me (and the OP) what we seek - the ability to
re-use our existing TCHAR-based Windows code on the Mac.

If the goal is to re-use the TCHAR-based code on the Mac, just define
TCHAR as a no op, and use plain char. The compilers on the Mac support
UTF-8 string literals. That is, the following code is fine on the Mac
and works as expected:

[ccox@ccox-macpro:~]% cat test.m
#import <Foundation/Foundation.h>

int main() {
   const char *cstring = "My name is Clark. ç§™ã™®å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃƒe Clark.";
   NSString *nsstring = @"My name is Clark. ç§™ã™®å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃƒe Clark.";

   printf("%s\n%s\n", cstring, [nsstring UTF8String]);
   return 0;
}
[ccox@ccox-macpro:~]% cc test.m -framework Foundation -fobjc-gc && ./a.out
My name is Clark. ç§™ã™®å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃƒe Clark.
My name is Clark. ç§™ã™®å∆Ÿå≈Ÿã™¯ã≠¯ã∞©ã∞πã≠¯ã™§ã™˙ã•≠Ich heiÃƒe Clark.

--
Clark S. Cox III
email@hidden

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: swprintf fails with extended character codes
  - From: Lehel Bernadt <email@hidden>

Prev by Date: Re: Xcode and Clang/LLVM trunk
Next by Date: Re: Remote debugging using Xcode
Previous by thread: Re: swprintf fails with extended character codes
Next by thread: Re: swprintf fails with extended character codes
Index(es):
- Date
- Thread