Re: encoding of file names
Re: encoding of file names
- Subject: Re: encoding of file names
- From: Quincey Morris <email@hidden>
- Date: Tue, 24 May 2011 23:01:27 -0700
On May 24, 2011, at 22:12, Ken Thomases wrote:
> Also, I wouldn't say that codepoints "may each consist of a variable number of components". They may be _encoded_ to a variable number of components, but don't "consist" of them.
OK.
>> This make absolutely no sense unless the word "character" is here understood to mean "component".
>
> Well, I would say "codepoint" is more proper. "O", the combining umlaut (diaeresis), and "Ö" are all distinct codepoints. They are not components, although they can be represented by components.
On reflection, I think I agree with you. But see below.
> Well, you're correct that a component-by-component comparison is equivalent to a codepoint-by-codepoint comparison. I disagree that NSString doesn't have the latter. Because of the equivalence, I suppose it may be a matter of perspective.
It's a matter of perspective at a conceptual level, and if that was the only consideration I'd be happy to adopt your perspective.
However, in practical terms, the indexable string elements are components, not codepoints.
It seems to me the single hardest thing to come to grips with when newly approaching NSString is understanding that 'unichar's (and "characters" in the sense of [characterAtIndex:]) *aren't* codepoints. In fact, AFAICT the only way to *represent* a codepoint in NSString is indirectly, as a single-Unicode-character string where you happen to know from general Unicode knowledge that the character is represented uniquely by a single codepoint.
Unfortunately, I believe, most people newly arrived at NSString will assume that 'unichar'/[characterAtIndex:] is a Unicode codepoint***, and have no reason to study the documentation carefully enough to see that this is a false assumption.
*** I think that's what they'd assume if they know a fair bit about Unicode. If they know less than that, they'll likely assume 'unichar' is a Unicode character, which is even further from the truth.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden