Re: Unicode filenames with Apple File System and UIManagedDocument
Re: Unicode filenames with Apple File System and UIManagedDocument
- Subject: Re: Unicode filenames with Apple File System and UIManagedDocument
- From: Giacomo Tufano <email@hidden>
- Date: Tue, 21 Mar 2017 16:05:12 +0100
If Apple Support says (as It said) "iOS HFS Normalized UNICODE names , APFS now treats all files as a bag of bytes on iOS . We are requesting that Applications developers call the correct Normalization routines to make sure the file name contains the correct representation.” then I think that the solution will be to decompose the bytes so that the “bag of bytes” or the “normalized names” are the same. From the name of the API I *suppose* that [NSString decomposedStringWithCanonicalMapping] will do, but it needs to be tested, because the point is that on APFS you need to apply the same (de)composition that iOS HFS does (so to have the same bytes when asking for the file name).
Btw: I think this is a significant difference in filename management that I think will bite many developers, it should be fixed at file system level, IMHO… but who knows what the other implication are…
My 2 €cents,
Gt
> Il giorno 21 mar 2017, alle ore 15:18, Aandi Inston <email@hidden> ha scritto:
>
> Is the question, what is canonical mapping? I'm going to assume it is, so I
> can share what I found when I hit much the same issue. This is mostly from
> memory so let's hope it's right.
>
> Take the word Café. How many Unicode characters is this and what are they?
> Turns out there are two answers. The last character as seen on screen is a
> lower case e with an acute accent.
> Let's ignore C,a,f as they are the same in all answers. First answer: é is
> 'LATIN SMALL LETTER E WITH ACUTE' (U+00E9). We'll call this "composed". In
> UTF-8 that's two bytes, 0xC3 0xA9. (This is the answer you'd often get, but
> it's not the only answer, and not the one Apple filesystems like.)
>
> Second answer uses an accent character. These are designed to appear in the
> same space as another character. So combine "e" and an acute accent (like a
> floating, slanted apostrophe) and we have "é". This means you could get the
> same result from the two Unicode characters LATIN SMALL LETTER E (U+0065)
> COMBINING ACUTE ACCENT (U+0301). We'll call this "decomposed". In UTF-8
> that would be 0x65 0xCC 0x81: three bytes, two characters, combine to a
> single character. (This is the one Apple filesystems like).
>
> When you're typing in a word processor, or showing an alert, it hardly
> matters how you create the e acute. Both look the same. But searching may
> be a problem (not discussed) as may showing items in alphabetical order
> (also not discussed).
>
> Let's imagine now we have a filename Café. This could be represented in
> UTF-8 bytes as 0x41 0x61 0x66 0xC3 0xA9 (composed), or as 0x41 0x61 0x66
> 0x65 0xCC 0x81 (decomposed). But ultimately there needs to be a set of bits
> on disk, in a directory, saying the name of the file. When searching for a
> file we could have three choices (a) these two composed/decomposed are
> separate file names for two distinct files - whose name will look the same
> (b) these are the same file, which means all file access by name, and
> searching has to compose or decompose for comparison purposes (c) only one
> is allowed and the other is rejected or invalid.
>
> Where are we? A bit of (b) and a bit of (c). Finder and file dialogs always
> decompose what is typed, and this is stored as the string of bits giving
> the file name. It seems that some APIs will automatically decompose their
> input, and others won't, and we may be in transition [to judge from the bug
> response]. So for safety, use a method that decomposes. (Unicode define at
> least two other types of de/composition, not discussed).
>
> Apple calls decomposed "canonical". This is fine, except that Unicode
> refers to both "canonical decomposition" (what Apple filenames need) and
> "canonical composition" (the opposite). So if handling names via an Apple
> API made for filenames we are fine to talk of canonical file names. But if
> handling names with a general Unicode API, we need to understand that this
> means "canonical decomposition" rather than "canonical composition".
>
> On 21 March 2017 at 11:03, <email@hidden> wrote:
>
>>
>>> What Apple suggested is to Unicode-normalize the filename before adding
>> it to the URL. Did you try doing that?
>>
>> I’m trying to find out what that means.
> _______________________________________________
>
> Cocoa-dev mailing list (email@hidden)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden