Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: swprintf fails with extended character codes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: swprintf fails with extended character codes

Subject: Re: swprintf fails with extended character codes
From: Lehel Bernadt <email@hidden>
Date: Wed, 14 Apr 2010 13:24:15 +0200

Hi,

On 04/14/2010 11:02 AM, Ben Staveley-Taylor wrote:

First, my apologies for not editing the subject line in my last post. I
have reverted back to the original title.

Knowing now how unportable and vaguely specified wchar_t is, I realise
that we should have avoided it all those years ago when we did our
Unicode Windows codebase. Things are always clearer in hindsight!

Using wchar_t is absolutely portable, you just have to understand the concept behind it ;)

I don't like the idea of just treating TCHAR as char* and using UTF8
strings. It will work in many cases such as the simple printf show
below, but I have past experience of working with multibyte strings and
it is an absolute nightmare. Certain of the C standard string functions
just don't work reliably.

A standard C string means one character is one byte. If you use it to handle UTF-8 encoded strings then the traditional string functions won't work indeed.

I know we have code like this (I'm removing
our TCHAR-esque typedefs for simplicity):

char *pos = strrchr(path, '/');

and that won't respect multibyte characters.

Because this is not a multibyte function. First you have to convert it to wchar_t from your encoding, e.g. with iconv, and then use wcsrchr(converted_path, '/')

Maybe C lib functions like
that can be made to work with a suitable locale setting, but in a large
team someone is always going to write code that iterates through a
string character by character using a "char *p = <string>; while (*p++)
{ ... }" idiom and that's going to go wrong with UTF8 encoding.

Yes, switching to multibyte string handling is a considerable effort and change in the way of programming... you can't use char* anymore.


Also -- in an effort to be cross-platform -- we use the STL and Boost
string algorithms library quite a lot:

std::string str("Hello world");
if (boost::algorithm::istarts_with(str, "Hello)) { ... }

and I'm pretty sure the underlying Boost::Range mechanics require that
you can iterate a string solely by knowing the size of its character
type so would probably go wrong with UFT8 in strings. (I'm not 100%
certain of that, and I know this is not the place for a Boost discussion.)

Anyway, I understand my choices now. I have filed a bugreporter issue
but even if it gets addressed it's not going to be a solution for my
current project of course.

Thanks.

Ben Staveley-Taylor

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: swprintf fails with extended character codes
From: "Paul Sanders" <email@hidden>


References:  
  >Re: swprintf fails with extended character codes (From: Ben Staveley-Taylor <email@hidden>)




Prev by Date:
Re: Xcode-users Digest, Vol 7, Issue 153

Next by Date:
Re: swprintf fails with extended character codes

Previous by thread:
Re: swprintf fails with extended character codes

Next by thread:
Re: swprintf fails with extended character codes

Index(es):

Date
Thread