Re: Unicode filenames with Apple File System and UIManagedDocument
Re: Unicode filenames with Apple File System and UIManagedDocument
- Subject: Re: Unicode filenames with Apple File System and UIManagedDocument
- From: David Duncan <email@hidden>
- Date: Wed, 22 Mar 2017 15:55:39 -0700
> On Mar 22, 2017, at 2:25 PM, email@hidden wrote:
>
>>
>> On Mar 22, 2017, at 2:00 PM, David Duncan <email@hidden> wrote:
>>
>>>
>>> On Mar 22, 2017, at 4:15 AM, email@hidden wrote:
>>>
>>>>
>>>> On Mar 22, 2017, at 5:05 AM, Alastair Houghton <email@hidden> wrote:
>>>>
>>>> On 21 Mar 2017, at 20:49, Quincey Morris <email@hidden> wrote:
>>>>>
>>>>> On Mar 20, 2017, at 14:23 , email@hidden wrote:
>>>>>>
>>>>>> "iOS HFS Normalized UNICODE names , APFS now treats all file[ name]s as a bag of bytes on iOS . We are requesting that Applications developers call the correct Normalization routines to make sure the file name contains the correct representation."
>>>>>
>>>>> I’ve been letting this simmer for a couple of days now, and I’ve come to the conclusion that it’s — sincere apologies to the unnamed Apple engineer who wrote it — as dumb as dirt.
>>>>>
>>>>> — It’s not a "bag of bytes”, because bags of stuff are generally understood as unordered sets, and I doubt that’s what’s intended. It has to be a sequence of bytes.
>>>>
>>>> In the context of filesystems (and specifically filenames), the phrases “bag of bytes” and “bunch of bytes” have a fairly specific meaning. The point is that the filesystem doesn’t inspect the bytes it’s given, and doesn’t care what they represent (about the only exception is that it probably doesn’t support embedded NULs). It isn’t suggesting that the names are treated as an unordered set of bytes (that’d just be silly). It’s just expressing the fact that the filesystem doesn’t care what they are - it may compare them, and if it does so, it will use binary ordering (not some other collation sequence) and won’t worry about things like case or encoding at all.
>>>>
>>>>> — It’s not just a string, it has to be a string in a known encoding. Otherwise, how could you ever mount an external drive on a different computer? The encoding has to be pre-specified for APFS, or it has to be stored in metadata on each volume.
>>>>
>>>> Agreed, that’s where the “bunch of bytes” approach falls down.
>>>>
>>>>> — It’s not just going to be a string of known encoding, it’s going to be Unicode. That’s going to be true even if the fact is specified in volume metadata and it’s theoretically possible to create APFS volumes with non-Unicode file names. Anything other than Unicode would, at this point, be a crime against humanity.
>>>>
>>>> If I’d designed APFS, it probably would use Unicode names (and it’d store the version of Unicode it used in the filesystem header, to avoid having to hard-code it).
>>>>
>>>> But I didn’t design it - Dominic Giampaolo and his team did - and we still don’t have that much information about how APFS works. I’m sure they had their reasons for whatever decision they’ve made here.
>>>>
>>>>> Is *that* the bottom line? I doubt it. I don’t believe the above quoted statement can be correct. I could believe that normalization is being moved out of the file system code, but it would have to be moved to (e.g.) the Cocoa frameworks, still “downstream” of the file-handling APIs. It can’t go upstream of the public APIs without breaking an API contract that has existed for the 16+ years since OS X 10.0.
>>>>
>>>> This is a tricky area. The problem with what we have at the moment (-fileSystemRepresentation) is that it *assumes* HFS+ semantics. That isn’t always going to be correct for existing non-HFS+ filesystems, let alone in the future. Of course, if you’re using the NSURL or NSString methods, rather than calling the BSD or C library APIs yourself, this is all hidden from you anyway (you certainly shouldn’t, IMO, be required to do anything unusual at Cocoa level - the Foundation framework should just make this all work, rather in the same way it presently does for numerous other things).
>>>>
>>>> It’s also complicated by the fact that, unlike on DOS or Windows, UNIX-like systems use a unified filesystem - that is, other filesystems are joined on at mount points. Thus you could have a name like
>>>>
>>>> /Volumes/Foo/Bar/Baz/Blam
>>>>
>>>> where (say) both Foo and Baz are mount points, and the rules about filenames could differ markedly, at least in principle; that is, /Volumes/Foo would have to conform to HFS+ (or APFS) rules, Bar/Baz to whatever rules govern the filesystem mounted at Foo, and Blam to whatever rules govern the filesystem mounted at Baz. And remember, not every filesystem will be using a well known encoding - macOS already has code to add and remove percent escapes (I kid you not) for this very reason.
>>>>
>>>> I’d like to hear what Dominic has to say (at least what he *can* say) about this, since he’s likely in a position to shed some light on it - or at least to take on board that we’re worrying about it. At the very least it’d be nice to see some more detail about APFS published somewhere *soon*...
>>>>
>>>> Kind regards,
>>>>
>>>> Alastair.
>>>>
>>>> --
>>>> http://alastairs-place.net
>>>
>>>
>>> I think it should be taken care of by NSURL so developers don’t need to worry about it but that doesn’t appear to be the case, but, at this point I just want to know what the correct thing to do is. And maybe it does (which means there was a bug in the APFS conversion), but I can’t tell for certain.
>>>
>>> I’ve uploaded different versions to TestFlight for the person to try but at this point the original version of my app and each of these different versions all allow the user to open files created on iOS 10.3 with Arabic names but none of them seem to allow the user to open files that were created on 10.2 unless the files are renamed to English. So either NSURL takes care of it and there was a bug in the APFS conversion or we do need to do something additional when sending NSStrings to the NSURL methods. I realize is isn’t official support channels but it would be really nice to hear from Apple. I’ll probably use one of my DTS incidents to ask when I have time to submit the request sometime this week. I’ll certainly report back if I get a definitive answer.
>>
>> So there was another explanation posted on the bug that I’m not certain you got, but which I think may explain.
>>
>> Basically the concept is that since APFS doesn’t normalize file names, if you store file names in some other storage (say in your preferences) then what could happen is this:
>>
>> 10.2: File is saved with a file name handed to the file system in NFC form. File system converts the file name to NFD. You store it as NFC.
>> 10.3: File system is converted to APFS, and the file name is NFD. You try to look up the file as NFC, and it fails.
>>
>> This would also mean that newly created files on APFS are always accessible, even via the “same” name, because the file system stored the filename as you presented it. Its only files whose name you stored in a different form from the system that are inaccessible via those old names.
>
>
> Hi David (Duncan),
>
> That was not included in what I can see in the bug report rdar://30993389 <rdar://30993389>. The entire response I can see is:
>
> ---
> "Engineering has the following feedback for you:
>
> iOS HFS Normalized UNICODE names , APFS now treats all files as a bag of bytes on iOS . We are requesting that Applications developers call the correct Normalization routines to make sure the file name contains the correct representation.
>
> We are now closing this bug report.
>
> If you have questions or comments about the resolution, please update your bug report with that information so we can respond."
> ----
>
>
> So back to the question of what can I do right now to solve this problem (so that others upgrading from iOS 10.2 to 10.3 don't have this same issue)? I'm guessing 10.3 will be released very soon so I need to get the updated submitted ASAP.
>
> Yes, I have a plist file with the names as entered by the user.
>
> Do I use the following?
>
> NSURL *url = [[self courseDirectory] URLByAppendingPathComponent:name.decomposedStringWithCanonicalMapping];
>
> Where [self courseDirectory] is a URL in English (assuming the sandboxed Documents directory are in English for other locales) and name is the NSString the user enters.
>
> If I understand what you're saying that should work for both 10.2 (which is automatically doing that conversion at the filesystem level) and for 10.3 (for both files created on 10.2 and for files created on 10.3). I'm happy to submit a DTS incident if necessary, but at this point I'm concerned 10.3 might be released before I get a response from that. I've only submitted one in the last 8 years and it took a couple weeks to get a response.
Sadly, I’m not a file systems expert so I don’t know exactly what you should do, but I think if I were in your shoes I would probably write a “recovery” where if you don’t find a file name that was previously saved to your plist, you just iterate over the file system and present the filenames you get from there instead of using your plist. Not sure how reasonable this is for you, and DTS might have some other ideas for you.
> Thanks,
> Dave Reed
--
David Duncan
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden