Re: How to get an NSString from a non-terminated array of unicode chars (length is known)
Re: How to get an NSString from a non-terminated array of unicode chars (length is known)
- Subject: Re: How to get an NSString from a non-terminated array of unicode chars (length is known)
- From: "Clark Cox" <email@hidden>
- Date: Tue, 4 Mar 2008 11:15:58 -0800
On Tue, Mar 4, 2008 at 8:58 AM, Brady Duga <email@hidden> wrote:
>
> On Mar 4, 2008, at 8:25 AM, Dave Camp wrote:
> >
> > You actually have two problems here:
> >
> > 1) wchar_t on the Mac is a 4 byte per character container (32 bits).
>
> Not quite correct. wchar_t, may, at this time, default to 4 bytes in
> an Xcode project, but it is *not* defined to be 4 bytes on the Mac.
Actually, all of the standard C and C++ libraries on the Mac define
wchar_t as 4 bytes. If you change it to be 2 bytes, then you will be
unable to call any standard-library functions taking wchar_t (or a
pointer thereto) as a parameter. For all intents and purposes, wchar_t
is 4 bytes on the Mac (and will be on any platform that intends to put
Unicode into wchar_t *and* support the C99 standard).
> In fact, it is quite easy to make wchar_t be 2 bytes.
If Apple were to change wchar_t to be 2 bytes instead of 4, it would
break every single piece of software on the Mac that uses wchar_t.
> Assumptions about the actual size of a wchar_t are probably a bug.
There are some situations in which the C standard allows you to assume
that wchar_t contains UTF-32/UCS-4:
>From C99 (with TC2 applied):
"__STDC_ISO_10646__
An integer constant of the form yyyymmL (for example, 199712L). If
this symbol is defined, then every character in the "Unicode required
set", when stored in an object of type wchar_t, has the same value as
the short identifier of that character. The "Unicode required set"
consists of all the characters that are defined by ISO/IEC 10646,
along with all amendments and technical corrigenda, as of the
specified year and month."
-----
Therefore (since 2001/11 was the month in which the Unicode character
set grew beyond 16-bit):
#if defined(__STDC_ISO_10646__) && __STDC_ISO_10646__ >= 200111L
//wchar_t must be UTF-32/UCS-4.
#endif
Of course, gcc on the Mac doesn't yet define __STDC_ISO_10646__, so my
point is mostly academic, but there may come a time when it *is* safe
to make assumptions about the size of wchar_t.
--
Clark S. Cox III
email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden