Re: Normalisation of filenames
Re: Normalisation of filenames
- Subject: Re: Normalisation of filenames
- From: Aki Inoue <email@hidden>
- Date: Sat, 01 Apr 2017 20:59:25 -0700
> On Apr 1, 2017, at 4:57 PM, Gerriet M. Denkmann <email@hidden> wrote:
>
>
>> On 2 Apr 2017, at 06:33, Jens Alfke <email@hidden> wrote:
>>
>>
>>> On Apr 1, 2017, at 11:58 AM, Gerriet M. Denkmann <email@hidden> wrote:
>>>
>>> I think that the examples above show, that NSURL does indeed do something about normalising Unicode strings.
>>
>> That makes sense; I’d expect that one of the RFCs covering URLs describes normalization. Otherwise constructing URLs (for a REST API, say) could become quite ambiguous because you wouldn’t know which way to encode various Unicode characters.
>>
>>> But my point is that NSURL gets the normalisation wrong in this case; or at least that it is not very consistent in normalising strings.
>>
>> Yes, it does seem wrong that you can have two filenames that are treated as distinct by the filesystem, but whose URL.path properties produce identical NSStrings.
>
> Sorry, my explanation was not quite clear: these two filenames look absolutely identical, but as a sequence of Unicode code points, they are not (tone-mark and vowel are in different order).
>
> What puzzles me is that consonant + THAI CHARACTER MAI EK + THAI CHARACTER SARA UU gets normalised by NSURL to: consonant + THAI CHARACTER SARA UU + THAI CHARACTER MAI EK (note the different order), whereas consonant + THAI CHARACTER MAI EK + THAI CHARACTER SARA II is left unchanged.
Garret,
This is the standard Unicode Normalization behavior. Each Unicode character is assigned the Unicode Combining Property, an integer value defining the canonical ordering of combining marks.
The Unicode Combining Property for THAI CHARACTER SARA UU is 103, and THAI CHARACTER MAI EK 107. So, MAI EK always comes after SARA UU in the canonical order.
On the other hand, THAI CHARACTER SARA II has the property value 0 which indicates the start of the reordering segment. That’s why the character is not reordered in respect to other Thai combining characters.
Aki
>
>
>> (I assume you’ve been following the recent thread here about potential Unicode problems with the APFS filesystem in iOS 10.3? It sounds like things might become even more confusing.)
>
> Yes, indeed I have. That started me worrying and looking into these normalisation issues.
>
>
> Kind regards,
>
> Gerriet.
>
>
> _______________________________________________
>
> Cocoa-dev mailing list (email@hidden)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden