Re: Unicode filenames with Apple File System and UIManagedDocument
Re: Unicode filenames with Apple File System and UIManagedDocument
- Subject: Re: Unicode filenames with Apple File System and UIManagedDocument
- From: Alastair Houghton <email@hidden>
- Date: Wed, 22 Mar 2017 09:05:51 +0000
On 21 Mar 2017, at 20:49, Quincey Morris <email@hidden> wrote:
>
> On Mar 20, 2017, at 14:23 , email@hidden wrote:
>>
>> "iOS HFS Normalized UNICODE names , APFS now treats all file[ name]s as a bag of bytes on iOS . We are requesting that Applications developers call the correct Normalization routines to make sure the file name contains the correct representation."
>
> I’ve been letting this simmer for a couple of days now, and I’ve come to the conclusion that it’s — sincere apologies to the unnamed Apple engineer who wrote it — as dumb as dirt.
>
> — It’s not a "bag of bytes”, because bags of stuff are generally understood as unordered sets, and I doubt that’s what’s intended. It has to be a sequence of bytes.
In the context of filesystems (and specifically filenames), the phrases “bag of bytes” and “bunch of bytes” have a fairly specific meaning. The point is that the filesystem doesn’t inspect the bytes it’s given, and doesn’t care what they represent (about the only exception is that it probably doesn’t support embedded NULs). It isn’t suggesting that the names are treated as an unordered set of bytes (that’d just be silly). It’s just expressing the fact that the filesystem doesn’t care what they are - it may compare them, and if it does so, it will use binary ordering (not some other collation sequence) and won’t worry about things like case or encoding at all.
> — It’s not just a string, it has to be a string in a known encoding. Otherwise, how could you ever mount an external drive on a different computer? The encoding has to be pre-specified for APFS, or it has to be stored in metadata on each volume.
Agreed, that’s where the “bunch of bytes” approach falls down.
> — It’s not just going to be a string of known encoding, it’s going to be Unicode. That’s going to be true even if the fact is specified in volume metadata and it’s theoretically possible to create APFS volumes with non-Unicode file names. Anything other than Unicode would, at this point, be a crime against humanity.
If I’d designed APFS, it probably would use Unicode names (and it’d store the version of Unicode it used in the filesystem header, to avoid having to hard-code it).
But I didn’t design it - Dominic Giampaolo and his team did - and we still don’t have that much information about how APFS works. I’m sure they had their reasons for whatever decision they’ve made here.
> Is *that* the bottom line? I doubt it. I don’t believe the above quoted statement can be correct. I could believe that normalization is being moved out of the file system code, but it would have to be moved to (e.g.) the Cocoa frameworks, still “downstream” of the file-handling APIs. It can’t go upstream of the public APIs without breaking an API contract that has existed for the 16+ years since OS X 10.0.
This is a tricky area. The problem with what we have at the moment (-fileSystemRepresentation) is that it *assumes* HFS+ semantics. That isn’t always going to be correct for existing non-HFS+ filesystems, let alone in the future. Of course, if you’re using the NSURL or NSString methods, rather than calling the BSD or C library APIs yourself, this is all hidden from you anyway (you certainly shouldn’t, IMO, be required to do anything unusual at Cocoa level - the Foundation framework should just make this all work, rather in the same way it presently does for numerous other things).
It’s also complicated by the fact that, unlike on DOS or Windows, UNIX-like systems use a unified filesystem - that is, other filesystems are joined on at mount points. Thus you could have a name like
/Volumes/Foo/Bar/Baz/Blam
where (say) both Foo and Baz are mount points, and the rules about filenames could differ markedly, at least in principle; that is, /Volumes/Foo would have to conform to HFS+ (or APFS) rules, Bar/Baz to whatever rules govern the filesystem mounted at Foo, and Blam to whatever rules govern the filesystem mounted at Baz. And remember, not every filesystem will be using a well known encoding - macOS already has code to add and remove percent escapes (I kid you not) for this very reason.
I’d like to hear what Dominic has to say (at least what he *can* say) about this, since he’s likely in a position to shed some light on it - or at least to take on board that we’re worrying about it. At the very least it’d be nice to see some more detail about APFS published somewhere *soon*...
Kind regards,
Alastair.
--
http://alastairs-place.net
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden