Re: rangeOfString behaves wierd
Re: rangeOfString behaves wierd
- Subject: Re: rangeOfString behaves wierd
- From: "Gerriet M. Denkmann" <email@hidden>
- Date: Mon, 09 Dec 2013 17:38:34 +0700
On 9 Dec 2013, at 16:53, Stephen J. Butler <email@hidden> wrote:
> Would converting each string to NFD (decomposedStringWithCanonicalMapping) be an acceptable work around in this case?
No, it would not. I am changing all my rangeOfString calls to use NSLiteralSearch, which does not have these strange effects.
Gerriet.
>
>
> On Mon, Dec 9, 2013 at 3:43 AM, Stephen J. Butler <email@hidden> wrote:
> OK, you are right. Copy+paste didn't preserve the compatibility character. Does look like a bug of sorts, or at least something a unicode expert should explain.
>
>
> On Mon, Dec 9, 2013 at 3:20 AM, Gerriet M. Denkmann <email@hidden> wrote:
>
> On 9 Dec 2013, at 16:00, Stephen J. Butler <email@hidden> wrote:
>
> > I don't get the same result. 10.9.0, Xcode 5.0.2. I created an empty command line utility, copied the code, and I get NSNotFound.
> >
> > 2013-12-09 02:50:19.822 Test[73850:303] main "见≠見" (3 shorts) occurs in "见=見見" (4 shorts) at {9223372036854775807, 0}
>
> Copying might invoke another bug.
> Better check the characters, like:
>
> - (void)printString: (NSString *)line
> {
> NSLog(@"%s \"%@\" has characters:",__FUNCTION__, line);
>
> [ line enumerateSubstringsInRange: NSMakeRange( 0, [ line length ] )
> options: NSStringEnumerationByComposedCharacterSequences
> usingBlock: ^(NSString *currChar, NSRange currCharRange, NSRange enclosingRange, BOOL *stop)
> {
> (void)enclosingRange;
> (void)stop;
>
> #ifdef __LITTLE_ENDIAN__
> NSStringEncoding encoding = NSUTF32LittleEndianStringEncoding;
> #else
> NSStringEncoding encoding = NSUTF32BigEndianStringEncoding;
> #endif
> NSData *data = [ currChar dataUsingEncoding: encoding ];
>
> NSUInteger nbrBytes = [ data length ];
> NSUInteger nbrChars = nbrBytes / sizeof(unsigned int);
>
> if ( nbrChars * sizeof(unsigned int) != nbrBytes ) // error
> {
> NSLog(@"%s Error: strange nbr of bytes %lu",__FUNCTION__, nbrBytes);
> return;
> };
>
> unsigned int codePoint[nbrChars];
> [ data getBytes: &codePoint length: nbrBytes ];
>
> NSMutableString *s = [ NSMutableString stringWithFormat: @"%@ = ",
> NSStringFromRange(currCharRange)
> ];
> for( NSUInteger i = 0; i < nbrChars; i++ )
> {
> [ s appendFormat: @"%#06x ", codePoint[i] ];
> };
>
> [ s appendFormat: @"= \"%@\"", currChar ];
>
> fprintf(stderr, "%s\n", [ s UTF8String]);
> }
> ];
> }
>
> and check for:
> "见=見見" has characters:
> {0, 1} = 0x89c1 = "见"
> {1, 1} = 0x003d = "="
> {2, 1} = 0xfa0a = "見"
> {3, 1} = 0x898b = "見"
> "见≠見" has characters:
> {0, 1} = 0x89c1 = "见"
> {1, 1} = 0x2260 = "≠"
> {2, 1} = 0x898b = "見"
>
> >
> > On Mon, Dec 9, 2013 at 2:43 AM, Gerriet M. Denkmann <email@hidden> wrote:
> >
> > On 9 Dec 2013, at 15:05, Quincey Morris <email@hidden> wrote:
> >
> > > On Dec 8, 2013, at 23:46 , Gerriet M. Denkmann <email@hidden> wrote:
> > >
> > >> NSString *b = @"见≠見"; // 0x89c1 0x2260 0x898b
> > >
> > > So what are the results with:
> > >
> > >> NSString *b = @"见”;
> > >> NSString *b = @"≠”;
> > >> NSString *b = @"見”;
> > > ?
> > >
> > > Does specifying an explicit locale make any difference?
> >
> > Explicit specifying en_US (as probably the best tested and debugged) makes no difference.
> >
>
>
>
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden