Re: Unicode filenames with Apple File System and UIManagedDocument
Re: Unicode filenames with Apple File System and UIManagedDocument
- Subject: Re: Unicode filenames with Apple File System and UIManagedDocument
- From: David Duncan <email@hidden>
- Date: Thu, 23 Mar 2017 09:24:23 -0700
I just want to remind everyone I’m *not* a file system’s engineer – I’m just trying to help Dave (and anyone else caught in this) make sure their app can find their files.
> On Mar 23, 2017, at 1:53 AM, Alastair Houghton <email@hidden> wrote:
>
> On 22 Mar 2017, at 18:00, David Duncan <email@hidden> wrote:
>>
>> So there was another explanation posted on the bug that I’m not certain you got, but which I think may explain.
>>
>> Basically the concept is that since APFS doesn’t normalize file names, if you store file names in some other storage (say in your preferences) then what could happen is this:
>>
>> 10.2: File is saved with a file name handed to the file system in NFC form. File system converts the file name to NFD. You store it as NFC.
>> 10.3: File system is converted to APFS, and the file name is NFD. You try to look up the file as NFC, and it fails.
>
> This is going to cause problems, though, when things migrate from HFS+ to APFS, because the HFS normalisation *isn’t* a standard one. In particular, it certainly *isn’t* NFD for the current version of Unicode.
Yes, that is the crux of Dave’s issue – HFS+ => APFS only translated the file names (from UTF-16 to UTF-8), it did not re-normalize them.
> The only obvious solution for that would be to have the HFS+ to APFS migration tool *re-normalise* the filenames (maybe it does?), but that’s bound to break things in the (presumably quite common) case where the filename stored in e.g. a plist was originally obtained from the filesystem.
Arguably there is no way for the file system converter to know how it should renormalize file names. This is akin to case sensitive vs case insensitive file systems. If you ran a converter from a case insensitive file system to a case sensitive one, you could preserve the capitalization during the conversion, but file lookups that used the wrong case would fail after the conversion. But the converter can’t know you want to look up “foo” via “FOO” or “Foo” to do any kind of normalization. The difference here is that for the most part unicode normalization is invisible to the developer.
>
> Kind regards,
>
> Alastair.
>
> --
> http://alastairs-place.net
>
--
David Duncan
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden