HPS+ non-decomposed character ranges
HPS+ non-decomposed character ranges
- Subject: HPS+ non-decomposed character ranges
- From: Hiroaki Nakamura <email@hidden>
- Date: Sat, 31 Mar 2012 08:44:53 +0900
Hi,
I found some documents about HFS+ character decomposition.
However the ranges of characters which are not decomposed are
different among them :-(
Which is correct?
Internationalization Programming Topics / File Encodings and Fonts /
File Systems and Unicode Support
https://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPInternational/Articles/FileEncodings.html
> (Characters in the ranges U2000-U2FFF, UF900-UFA6A, and U2F800-U2FA1D are not decomposed.)
Technical Note TN1150 HFS Plus Volume Format
http://developer.apple.com/legacy/mac/library/#technotes/tn/tn1150.html
> HFS Plus stores strings fully decomposed and in canonical order. HFS Plus compares strings in a case-insensitive fashion. Strings may contain Unicode characters that must be ignored by this comparison. For more details on these subtleties, see Unicode Subtleties.
>
> A variant of HFS Plus, called HFSX, allows volumes whose names are compared in a case-sensitive fashion. The names are fully decomposed and in canonical order, but no Un
icode characters are ignored during the comparison.
Technical Q&A QA1173 Text Encodings in VFS
https://developer.apple.com/library/mac/#qa/qa1173/_index.html
> Important The terms used in this Q&A, precomposed and decomposed, roughly correspond to Unicode Normal Forms C and D, respectively. However, most volume formats do not follow the exact specification for these normal forms. For example, HFS Plus (Mac OS Extended) uses a variant of Normal Form D in which U+2000 through U+2FFF, U+F900 through U+FAFF, and U+2F800 through U+2FAFF are not decomposed (this avoids problems with round trip conversions from old Mac text encodings). It's likely that your volume format has similar oddities.
Unicode Utilities Reference
https://developer.apple.com/library/mac/#documentation/Carbon/Reference/Unicode_Utilities_Ref/Reference/reference.html
> kUCCollateTypeHFSExtended
> The kUCCollateTypeHFSExtended ordering scheme sorts maximally decomposed Unicode according to the rules used by the HFS Extended volume format for its catalog. When this order is used, other collation options are ignored; this order is always case-insensitive (for decomposed characters) and ignores the Unicode characters 200C-200F, 202A-202E, 206A-206F, FEFF.
Thanks in advance.
--
)Hiroaki Nakamura) email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden