• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Normalisation of filenames
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Normalisation of filenames


  • Subject: Re: Normalisation of filenames
  • From: "Gerriet M. Denkmann" <email@hidden>
  • Date: Sun, 02 Apr 2017 15:50:19 +0700

> On 2 Apr 2017, at 10:59, Aki Inoue <email@hidden> wrote:
>
>
>> On Apr 1, 2017, at 4:57 PM, Gerriet M. Denkmann <email@hidden> wrote:
>>
>>
>>> On 2 Apr 2017, at 06:33, Jens Alfke <email@hidden> wrote:
>>>
>>>
>>>> On Apr 1, 2017, at 11:58 AM, Gerriet M. Denkmann <email@hidden> wrote:
>>>>
>>>> I think that the examples above show, that NSURL does indeed do something about normalising Unicode strings.
>>>
>>> That makes sense; I’d expect that one of the RFCs covering URLs describes normalization. Otherwise constructing URLs (for a REST API, say) could become quite ambiguous because you wouldn’t know which way to encode various Unicode characters.
>>>
>>>> But my point is that NSURL gets the normalisation wrong in this case; or at least that it is not very consistent in normalising strings.
>>>
>>> Yes, it does seem wrong that you can have two filenames that are treated as distinct by the filesystem, but whose URL.path properties produce identical NSStrings.
>>
>> Sorry, my explanation was not quite clear: these two filenames look absolutely identical, but as a sequence of Unicode code points, they are not (tone-mark and vowel are in different order).
>>
>> What puzzles me is that consonant + THAI CHARACTER MAI EK + THAI CHARACTER SARA UU gets normalised by NSURL to:  consonant + THAI CHARACTER SARA UU + THAI CHARACTER MAI EK (note the different order), whereas consonant + THAI CHARACTER MAI EK + THAI CHARACTER SARA II is left unchanged.
> Garret,
>
> This is the standard Unicode Normalization behavior. Each Unicode character is assigned the Unicode Combining Property, an integer value defining the canonical ordering of combining marks.
>
> The Unicode Combining Property for THAI CHARACTER SARA UU is 103, and THAI CHARACTER MAI EK 107. So, MAI EK always comes after SARA UU in the canonical order.
>
> On the other hand, THAI CHARACTER SARA II has the property value 0 which indicates the start of the reordering segment. That’s why the character is not reordered in respect to other Thai combining characters.
>
> Aki

Thanks a lot for this explanation.

I just read about  Combining_Character_Class in <http://unicode.org/reports/tr44/#Validation_of_CCC>.

What I did not find was an explanation why all Thai top-vowels (+ THAI CHARACTER MAI HAN-AKAT) have Combining_Character_Class 0, Not_Reordered, whereas the bottom vowels have 103.

Another strange thing: the tone marks have 107, but THAI CHARACTER THANTHAKHAT has 0. (This sometimes occurs together with ิ, e.g. เกียรติ์, or ุ, e.g. บงสุ์ )

If you have any links to an explanation for these (to me) rather strange decisions of the Unicode people, I would appreciate this very much.


Kind regards,

Gerriet.


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden


  • Follow-Ups:
    • Re: Normalisation of filenames
      • From: Aki Inoue <email@hidden>
References: 
 >Normalisation of filenames (From: "Gerriet M. Denkmann" <email@hidden>)
 >Re: Normalisation of filenames (From: Quincey Morris <email@hidden>)
 >Re: Normalisation of filenames (From: "Gerriet M. Denkmann" <email@hidden>)
 >Re: Normalisation of filenames (From: Jens Alfke <email@hidden>)
 >Re: Normalisation of filenames (From: "Gerriet M. Denkmann" <email@hidden>)
 >Re: Normalisation of filenames (From: Aki Inoue <email@hidden>)

  • Prev by Date: Re: Normalisation of filenames
  • Next by Date: Re: Normalisation of filenames
  • Previous by thread: Re: Normalisation of filenames
  • Next by thread: Re: Normalisation of filenames
  • Index(es):
    • Date
    • Thread