Re: encoding of file names
Re: encoding of file names
- Subject: Re: encoding of file names
- From: Quincey Morris <email@hidden>
- Date: Thu, 26 May 2011 23:35:48 -0700
On May 26, 2011, at 22:56, Andrew Thompson wrote:
> I believe this stems from a period in history when the unicode group believed that they'd be able to fit all practical scripts into 65536 code points. Which meant you could get away with all kinds of assumptions like 16 bit types and UCS-2.
>
> As it became clear that wasn't going to be enough code points the additional planes were defined and ucs2 fell out of favor being replaced by UTF16 which can model the higher planes.
That would explain the parting of the ways between "code unit" and "code point", but not really the distinction between "code point" and "[Unicode] character". My memory of the days when Unicode first started to get a foothold (the early 90s IIRC) is very hazy, but I think there were actually two things going on:
-- The belief, exactly as you describe, that 65536 was enough.
-- A vagueness (or perhaps a deliberate lack of definition) about what should be called a "character".
This seems to have been resolved now, and we have this hierarchy, at least in Unicode/Apple terms:
code unit -> code point -> character -> grapheme -> (whatever the grouping is called upon which transformations like upper and lower case are performed)
It's not ultimately so hard, just a bit perilous for the unwary. That's the reason I've been going on about this ad nauseam. If we shine some light on it, we may help demystify it.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden