Re: Where is my bicycle?
Re: Where is my bicycle?
- Subject: Re: Where is my bicycle?
- From: Quincey Morris <email@hidden>
- Date: Mon, 06 Apr 2015 17:15:43 +0000
On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann <email@hidden> wrote:
>
> Where is my bicycle gone? What am I doing wrong?
Before this thread heads further into outer space…
I suspect it [NSCharacterSet] is just broken. Look here, for example:
http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this <http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this>
The problem is that it’s unclear whether the “characters” in NSCharacterSet are internally UTF-16 code units, UTF-32 code units, Unicode code points, or something else. According to the NSCharacterSet documentation:
> "An NSCharacterSet object represents a set of Unicode-compliant characters.”
and:
> "The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode).”
According the NSString documentation:
> "A string object presents itself as an array of Unicode characters (Unicode is a registered trademark of Unicode, Inc.). You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method.”
Working backwards, we know that the characters that are counted by -[NSString length]’ are UTF-16 code units, so this all *possibly* implies that NSCharacterSet characters are UTF-16 code units, too. Plus, back in NSCharacterSet documentation:
> "NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface.”
If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the API was enhanced in 10.2 (see: http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html <http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html>, for some tantalizing hints about NSCharacterSet), the implementation was a hack that works somehow but isn’t documented. I don’t think you’re going to get any definitive answer except directly from Apple.
A suggestion, though:
Try building your character set using ‘characterSetWithRange:’ and/or the NSMutableCharacterSet methods that add ranges, instead of using NSStrings. Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility reasons — when using NSStrings explicitly.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden