Re: Where is my bicycle?
Re: Where is my bicycle?
- Subject: Re: Where is my bicycle?
- From: "Gerriet M. Denkmann" <email@hidden>
- Date: Tue, 07 Apr 2015 01:15:05 +0700
> On 7 Apr 2015, at 00:15, Quincey Morris <email@hidden> wrote:
>
> On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann <email@hidden> wrote:
>>
>> Where is my bicycle gone? What am I doing wrong?
>
> Before this thread heads further into outer space…
>
> I suspect it [NSCharacterSet] is just broken. Look here, for example:
>
> http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
>
> The problem is that it’s unclear whether the “characters” in NSCharacterSet are internally UTF-16 code units, UTF-32 code units, Unicode code points, or something else. According to the NSCharacterSet documentation:
>
>> "An NSCharacterSet object represents a set of Unicode-compliant characters.”
>
> and:
>
>> "The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSString class cluster specification for information on Unicode).”
>
> According the NSString documentation:
>
>> "A string object presents itself as an array of Unicode characters (Unicode is a registered trademark of Unicode, Inc.). You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method.”
>
> Working backwards, we know that the characters that are counted by -[NSString length]’ are UTF-16 code units, so this all *possibly* implies that NSCharacterSet characters are UTF-16 code units, too. Plus, back in NSCharacterSet documentation:
>
>> "NSCharacterSet’s principal primitive method, characterIsMember:, provides the basis for all other instance methods in its interface.”
>
> If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
>
> Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the API was enhanced in 10.2 (see: http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html, for some tantalizing hints about NSCharacterSet), the implementation was a hack that works somehow but isn’t documented. I don’t think you’re going to get any definitive answer except directly from Apple.
>
> A suggestion, though:
>
> Try building your character set using ‘characterSetWithRange:’ and/or the NSMutableCharacterSet methods that add ranges, instead of using NSStrings. Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility reasons — when using NSStrings explicitly.
1. longCharacterIsMember seems to be ok:
NSCharacterSet *alphanumericCharacterSet = [ NSCharacterSet alphanumericCharacterSet ];
BOOL pp = [ alphanumericCharacterSet longCharacterIsMember: 0x2f800 ];
returns YES as it should.
2. characterSetWithCharactersInString seems to take only the lower 16 bits of the code points in the string. Bug.
Works ok though, if all chars in the string have code points ≥ 0x10000 (e.g. "𝄞🚲")
3. the documentation about bitmapRepresentation is wrong. It says: "A raw bitmap representation of a character set is a byte array of 2^16 bits (that is, 8192 bytes)."
But alphanumericCharacterSet has a bitmap with 32771 = 0x8003 bytes, which mostly look ok.
It has some strange things though at the end:
0x2fa1e → 0x2fa2d
0x30011 → 0x30207
which I do not recognise as alphanumeric.
4. characterSetWithRange works a bit better:
NSCharacterSet *a = [ NSCharacterSet characterSetWithRange: NSMakeRange(0x1F6B2,1) ];
BOOL pp = [ a longCharacterIsMember: 0x1F6B2 ]; → returns YES as it should.
But when I look at the bitmapRepresentation I see 16385 bytes with two bits set: 0x10000 and 0x1f6ba (8 bits off)
Looks like the format of the bitmapRepresentation is slightly more complex than documented.
Kind regards,
Gerriet.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden