What coded example do you mean?
Also, I know that Java doesn't store String objects in UTF-8.
I meant the literal string example you gave: "o\xcc\x88a\xcc\x88"
If you were to enter that as a String literal in Java code, it would
not represent the decomposed sequence small-o, umlaut, small-a, umlaut.
Your example contains two distinct codes:
First, the characters are represented as UTF8 codes. That's why I
questioned whether you realized that Java doesn't use UTF8.
Second, the UTF8-coded bytes are represented as \x-escaped values in
what is essentially a C string-literal. Java doesn't recognize \x-
escapes. It only recognizes \uXXXX escapes, which represent a UTF-16
code point, and a handful of single-character escapes like \t, \n, \r.
I just need to find out the HFS+ decomposed String representation
of a filename for a specified File.
Then maybe java.text.Normalizer still won'dt do exactly what you
need. The HFS+ decompositions are defined solely by the HFS+
specification: TN1150. That spec is mostly compatible with Unicode
NFD, but the spec itself says:
An implementation must not use the Unicode utilities implemented by
its native platform (for decomposition and comparison), unless those
algorithms are equivalent to the HFS Plus algorithms defined here,
and are guaranteed to be so forever. This is rarely the case.
Platform algorithms tend to evolve with the Unicode standard. The HFS
Plus algorithms cannot evolve because such evolution would invalidate
existing HFS Plus volumes.
So if you want 100% guaranteed HFS+ compatibility, you have to
implement a normalizer yourself, and it must cover the range of
characters that matter most to you.
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden