Re: Unicode filenames with Apple File System and UIManagedDocument
Re: Unicode filenames with Apple File System and UIManagedDocument
- Subject: Re: Unicode filenames with Apple File System and UIManagedDocument
- From: Quincey Morris <email@hidden>
- Date: Tue, 21 Mar 2017 13:49:38 -0700
On Mar 20, 2017, at 14:23 , email@hidden wrote:
>
> "iOS HFS Normalized UNICODE names , APFS now treats all file[ name]s as a bag of bytes on iOS . We are requesting that Applications developers call the correct Normalization routines to make sure the file name contains the correct representation."
I’ve been letting this simmer for a couple of days now, and I’ve come to the conclusion that it’s — sincere apologies to the unnamed Apple engineer who wrote it — as dumb as dirt.
— It’s not a "bag of bytes”, because bags of stuff are generally understood as unordered sets, and I doubt that’s what’s intended. It has to be a sequence of bytes.
— It’s not a sequence of bytes, because *everything* is a sequence of bytes, except perhaps things that are just a sequence of bits. It’s a sequence of bytes that represents a human-readable name. We have a word for that already: string.
— It’s not just a string, it has to be a string in a known encoding. Otherwise, how could you ever mount an external drive on a different computer? The encoding has to be pre-specified for APFS, or it has to be stored in metadata on each volume.
— It’s not just going to be a string of known encoding, it’s going to be Unicode. That’s going to be true even if the fact is specified in volume metadata and it’s theoretically possible to create APFS volumes with non-Unicode file names. Anything other than Unicode would, at this point, be a crime against humanity.
— It’s not just going to be Unicode, it’s going to be UTF-8 or UTF-16 or UTF-32. Again, it might be one of these code-point sizes by definition, or any of them according to volume or file metadata. If the code point size isn’t determinable in one of these ways, the names cannot be interpreted.
— Ditto endianness (for UTF-16 or UTF-32).
— What we’re left with is that, apparently, APFS is a normalization-sensitive file system (by analogy with the case-sensitive file system that iOS already has): it’s capable of giving different files the same names, that differ only in capitalization and normalization. Except, no. There’s no “correct” or “incorrect” for iOS file name capitalization — you can create and use files following your own private rules for capitalization — but according to the above quote it is “correct” to normalize APFS file names, and presumably incorrect to leave them unnormalized.
— So, if unnormalized names are not “correct”, then it’s not a normalization-sensitive file system either.
— What are we left with? Well, the same file naming system as iOS HFS+, with the actual normalization left out. Great. No developer is ever going to forget to handle that, right?
Is that the bottom line? Actually, no. Here’s the bottom line:
— If Apple did this, *every* existing app would instantly break. All of them. (The only exceptions perhaps being apps that don’t ever construct a path string or URL.) That’s apparently what happened to Dave. His perfectly correct app *broke*.
Is *that* the bottom line? I doubt it. I don’t believe the above quoted statement can be correct. I could believe that normalization is being moved out of the file system code, but it would have to be moved to (e.g.) the Cocoa frameworks, still “downstream” of the file-handling APIs. It can’t go upstream of the public APIs without breaking an API contract that has existed for the 16+ years since OS X 10.0.
Is there anything wrong with my reasoning here?
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden