• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: swprintf fails with extended character codes
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: swprintf fails with extended character codes


  • Subject: Re: swprintf fails with extended character codes
  • From: Lehel Bernadt <email@hidden>
  • Date: Wed, 14 Apr 2010 13:24:15 +0200


Hi,

On 04/14/2010 11:02 AM, Ben Staveley-Taylor wrote:
First, my apologies for not editing the subject line in my last post. I
have reverted back to the original title.

Knowing now how unportable and vaguely specified wchar_t is, I realise
that we should have avoided it all those years ago when we did our
Unicode Windows codebase. Things are always clearer in hindsight!

Using wchar_t is absolutely portable, you just have to understand the concept behind it ;)


I don't like the idea of just treating TCHAR as char* and using UTF8
strings. It will work in many cases such as the simple printf show
below, but I have past experience of working with multibyte strings and
it is an absolute nightmare. Certain of the C standard string functions
just don't work reliably.

A standard C string means one character is one byte. If you use it to handle UTF-8 encoded strings then the traditional string functions won't work indeed.


I know we have code like this (I'm removing
our TCHAR-esque typedefs for simplicity):

char *pos = strrchr(path, '/');

and that won't respect multibyte characters.

Because this is not a multibyte function. First you have to convert it to wchar_t from your encoding, e.g. with iconv, and then use
wcsrchr(converted_path, '/')



Maybe C lib functions like
that can be made to work with a suitable locale setting, but in a large
team someone is always going to write code that iterates through a
string character by character using a "char *p = <string>; while (*p++)
{ ... }" idiom and that's going to go wrong with UTF8 encoding.

Yes, switching to multibyte string handling is a considerable effort and change in the way of programming... you can't use char* anymore.



Also -- in an effort to be cross-platform -- we use the STL and Boost string algorithms library quite a lot:

std::string str("Hello world");
if (boost::algorithm::istarts_with(str, "Hello)) { ... }

and I'm pretty sure the underlying Boost::Range mechanics require that
you can iterate a string solely by knowing the size of its character
type so would probably go wrong with UFT8 in strings. (I'm not 100%
certain of that, and I know this is not the place for a Boost discussion.)

Anyway, I understand my choices now. I have filed a bugreporter issue
but even if it gets addressed it's not going to be a solution for my
current project of course.

Thanks.

Ben Staveley-Taylor

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: swprintf fails with extended character codes
      • From: "Paul Sanders" <email@hidden>
References: 
 >Re: swprintf fails with extended character codes (From: Ben Staveley-Taylor <email@hidden>)

  • Prev by Date: Re: Xcode-users Digest, Vol 7, Issue 153
  • Next by Date: Re: swprintf fails with extended character codes
  • Previous by thread: Re: swprintf fails with extended character codes
  • Next by thread: Re: swprintf fails with extended character codes
  • Index(es):
    • Date
    • Thread