Re: How to code a NSString literal with UTF8?
Re: How to code a NSString literal with UTF8?
- Subject: Re: How to code a NSString literal with UTF8?
- From: Nicko van Someren <email@hidden>
- Date: Thu, 31 Mar 2005 11:21:51 +0100
On 30 Mar 2005, at 19:18, Sean McBride wrote:
On 2005-03-30 08:59, Ali Ozer said:
NSString *s = [NSString stringWithUTF8String:"Long -- dash"]; //
Not safe; you're at the mercy of any tools you use
As I understand it, the danger here is that the compiler has no way of
knowing the file's text encoding, right?
Yes, the compiler has to make an assumption about the file encoding.
Of course the compiler already makes an assumption about the coding, as
you will rapidly discover when you try saving your C source in UTF-16
encoding, let alone EBCDIC.
The C99 spec explicitly allows top-bit-set bytes in source code but the
interpretation of them is essentially up to the application to
determine in the context of the locale. All the characters used by the
C language itself are taken from set that codes the same in ASCII and
UTF-8 and no continuation byte of a UTF-8 coding can be the same as any
ASCII character. Personally I think that it is entirely reasonable to
rely on this aspect of the C specification when entering Unicode
characters which are going to be interpreted using
stringWithUTF8String:; it is defined to work with the default encoding
for saving source and if the user changes the coding to anything
non-standard it could stop working for a whole stack of reasons.
and the following is not allowed:
NSString *s = @"Long -- dash"; // Not allowed
"Not allowed" but still accepted by gcc 3.3 (and CW 9.4).
<rdar://4073313>
It's accepted but in most cases results in the wrong string.
This I understand to be disallowed because it is so documented:
"@"string" - Defines a constant NSString object in the current module
and
initializes the object with the specified 7-bit ASCII-encoded string."
Fair enough.
But am I the only one who thinks it should be allowed?
No, I think it would be a fine thing too. That said, one problem I
can envisage is that since top-bit-set bytes inside @"..." constants
are passes silently by the compiler at the moment there would be a
danger that confusion would arise from code using this paradigm
compiling cleanly on both old and new compilers while generating very
different results.
Nicko
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden