Re: Unicode filenames with Apple File System and UIManagedDocument
Re: Unicode filenames with Apple File System and UIManagedDocument
- Subject: Re: Unicode filenames with Apple File System and UIManagedDocument
- From: Alastair Houghton <email@hidden>
- Date: Thu, 23 Mar 2017 08:50:34 +0000
On 22 Mar 2017, at 19:13, Chris Ridd <email@hidden> wrote:
>
>> On 22 Mar 2017, at 09:05, Alastair Houghton <email@hidden> wrote:
>>
>> In the context of filesystems (and specifically filenames), the phrases “bag of bytes” and “bunch of bytes” have a fairly specific meaning. The point is that the filesystem doesn’t inspect the bytes it’s given, and doesn’t care what they represent (about the only exception is that it probably doesn’t support embedded NULs). It isn’t suggesting that the names are treated as an unordered set of bytes (that’d just be silly). It’s just expressing the fact that the filesystem doesn’t care what they are - it may compare them, and if it does so, it will use binary ordering (not some other collation sequence) and won’t worry about things like case or encoding at all.
>
> That doesn’t sound sensible at all. It means you can create a filename with a byte sequence that isn’t valid UTF-8 and which likely then cannot be accessed by MacOS/iOS processes.
That isn’t possible on macOS - there’s a percent escaping mechanism built in to the kernel to prevent this problem.
> It means that you could create multiple files with the “same" name, and that doesn’t sound like a win either. e.g. Aandi’s examples of LATIN SMALL LETTER E (U+0065)
> COMBINING ACUTE ACCENT (U+0301) and LATIN SMALL LETTER E WITH ACUTE (U+00E9)
Yes, it does.
> How can a “next gen” filesystem avoid using Unicode rules when handling filenames?
Well, if I had designed it, it wouldn’t. But I didn’t.
To be fair, I can see arguments in favour of the bunch of bytes approach; the existing approach has created a problem in HFS+, in that the normalisation is essentially fixed for all time, and doesn’t correspond to the current version of Unicode. It’s actually worse than it might be, because (IIRC) they fixed the normalisation *before* Unicode adopted a stability policy for normalisation...
But if the filesystem (or kernel) isn’t doing it, then IMO the Cocoa frameworks certainly should.
Kind regards,
Alastair.
--
http://alastairs-place.net
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden