Thomas Singer wrote:
Is this a bug in Mac OS X' file system or a bug in Mac OS X' Java
implementation? Does someone knows, how to work around this problem
(except not using file names with umlauts)? Thanks in advance.
It's not a bug. It's a Unicode feature: decomposed marks.
As previously mentioned, the HFS+ file-system is storing (and returning)
accented characters in their decomposed form. It does this even when your
input string has characters in their composed form. This is a property of
the file-system. I suspect it will vary if you used a FAT-32 or UFS
file-system, but I have not actually tried this. In any case, Java itself
is silent on the normalization form of Unicode filenames, so any number of
possible compositions, decompositions, and partial compositions is possible.
A simple way to recompose some accents is the AccentComposer class, part of
my open source MacBinary Tookit for Java:
<http://www.amug.org/~glguerin/sw/#macbinary>
Extract just the class you need from the downloadable zip. It covers the
set of chars expressed in the original MacRoman charset, and may work for
you, depending on what you're doing. However, it is a very simple class,
using very simple algorithms, and does not scale up well nor cover all of
Unicode.
If you need something bigger, try ICU4J (see URL below).
Here's my list of Frequently Pasted URLs regarding Unicode encodings. Some
may be dead URLs; I haven't confirmed their liveness in a while.
Canonical Equivalence in Applications:
<http://www.unicode.org/notes/tn5/>
UAX #15: Unicode Normalization:
<http://www.unicode.org/reports/tr15/>
International Components for Unicode (see: ICU4J) -- 1.4+
<http://icu.sourceforge.net/>
ICU's Normalization:
<http://icu.sourceforge.net/userguide/normalization.html>
QA1235: Converting to Precomposed Unicode -- native calls, not Java
<http://developer.apple.com/qa/qa2001/qa1235.html>
TN1150 HFS+/HFSX format -- find "Canonical Decomposition" on that page:
<http://developer.apple.com/technotes/tn/tn1150.html>
<http://developer.apple.com/technotes/tn/tn1150table.html> -- table
-- GG