Re: NUL characters in NSString cause unexpected results
Re: NUL characters in NSString cause unexpected results
- Subject: Re: NUL characters in NSString cause unexpected results
- From: John Stiles <email@hidden>
- Date: Wed, 22 Nov 2006 23:02:36 -0800
That isn't legal UTF8 in the modern definition of the UTF8 spec.
Nowadays, the spec says that only shortest possible encoding of a
character is legal (or something along those lines).
I wouldn't be surprised if NSString allowed it anyway, since a lot of
implementations don't validate the input completely. But even if it did
work today, I could envision it breaking in a future OS if Apple decided
to clean up/revamp their Unicode implementation. (I haven't actually
tried it.)
Chris Suter wrote:
I'm not sure if it's a bug or not, but if it's documented, it's not
obvious to me.
What happens if you encode the NUL character as "\xc0\x80" in UTF8?
- Chris
On 23/11/2006, at 1:33 PM, Jerry Krinock wrote:
Every reference I can find on UTF8 states that any valid ASCII
character is
a valid UTF8 character, so this should include the ASCII 'NUL', 0x0.
But
when I init an NSString from data including a NUL, using UTF8
encoding, and
then send some messages to it, most messages behave as though the string
terminates at the NUL. For example,
1. The -description of the string terminates at the NUL.
2. When sent the message -stringByAppendingString, the result only
includes
the string up to the NUL. It looks like, under the hood, this method is
just strcat!
But -length gives the "correct" answer, including the part past the NUL.
This can lead to some unexpected exceptions when using -length to
determine
the safe NSRange one can operate upon.
Is this a bug or am I missing some documentation?
Jerry Krinock
****** DEMONSTRAITON ******
The following code constructs two NSStrings:
s1 = "A[NULL]B"
s2 = "CD"
and then concatenates them into a third string, sC, using
-stringByAppendingString.
I expect sc = "A[NULL]BCD".
I get sc = "ACD".
****** CODE ******
char* buf = (char*)malloc(3) ;
buf[0] = 'A' ;
buf[1] = 0x0 ;
buf[2] = 'B' ;
NSString* s1 = [[NSString alloc] initWithBytes:buf
length:3
encoding:NSUTF8StringEncoding] ;
NSLog(@"s1 = %@", s1) ;
NSString* s2 = @"CD" ;
NSLog(@"s2 = %@", s2) ;
NSString* sC = [s1 stringByAppendingString:s2] ;
NSLog(@"sC = %@", sC) ;
NSLog(@"length of s1:%i", [s1 length]) ;
NSLog(@"length of s2:%i", [s2 length]) ;
NSLog(@"length of sC:%i", [sC length]) ;
[s1 release] ;
free(buf) ;
****** CONSOLE OUTPUT ******
s1 = A
s2 = CD
sC = ACD
length of s1:3
length of s2:2
length of sC:3
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins (at) lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
------------------------------------------------------------------------
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins (at) lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins (at) lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden