Re: encoding of file names
Re: encoding of file names
- Subject: Re: encoding of file names
- From: Andrew Thompson <email@hidden>
- Date: Thu, 26 May 2011 22:56:12 -0700
> However, in practical terms, the indexable string elements are components, not codepoints.
>
> It seems to me the single hardest thing to come to grips with when newly approaching NSString is understanding that 'unichar's (and "characters" in the sense of [characterAtIndex:]) *aren't* codepoints. In fact, AFAICT the only way to *represent* a codepoint in NSString is indirectly, as a single-Unicode-character string where you happen to know from general Unicode knowledge that the character is represented uniquely by a single codepoint.
>
> Unfortunately, I believe, most people newly arrived at NSString will assume that 'unichar'/[characterAtIndex:] is a Unicode codepoint***, and have no reason to study the documentation carefully enough to see that this is a false assumption.
>
>
>
>
> *** I think that's what they'd assume if they know a fair bit about Unicode. If they know less than that, they'll likely assume 'unichar' is a Unicode character, which is even further from the
I believe this stems from a period in history when the unicode group believed that they'd be able to fit all practical scripts into 65536 code points. Which meant you could get away with all kinds of assumptions like 16 bit types and UCS-2.
As it became clear that wasn't going to be enough code points the additional planes were defined and ucs2 fell out of favor being replaced by UTF16 which can model the higher planes.
Both Java's String and Objective C's NSString have these sorts of API speed bumps because I think they were originally created in the ucs2 era where a 16bit code point was effectively a character and the mapping was simple. UTF16 was retrofitted over the existing API.
I actually built a Category for NSString that gives it methods that return UTF32 chars by handling surrogate pairs.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden