Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with umlauts in file names



Hi Greg,

Thank you very much. Your mappings in AccentComposer helped me much further.

--
Best regards,
Thomas Singer
_____________
smartcvs.com
smartsvn.com


Greg Guerin schrieb:
Thomas Singer wrote:


Is this a bug in Mac OS X' file system or a bug in Mac OS X' Java
implementation? Does someone knows, how to work around this problem
(except not using file names with umlauts)? Thanks in advance.


It's not a bug.  It's a Unicode feature: decomposed marks.

As previously mentioned, the HFS+ file-system is storing (and returning)
accented characters in their decomposed form.  It does this even when your
input string has characters in their composed form.  This is a property of
the file-system.  I suspect it will vary if you used a FAT-32 or UFS
file-system, but I have not actually tried this.  In any case, Java itself
is silent on the normalization form of Unicode filenames, so any number of
possible compositions, decompositions, and partial compositions is possible.

A simple way to recompose some accents is the AccentComposer class, part of
my open source MacBinary Tookit for Java:
  <http://www.amug.org/~glguerin/sw/#macbinary>

Extract just the class you need from the downloadable zip.  It covers the
set of chars expressed in the original MacRoman charset, and may work for
you, depending on what you're doing.  However, it is a very simple class,
using very simple algorithms, and does not scale up well nor cover all of
Unicode.

If you need something bigger, try ICU4J (see URL below).

Here's my list of Frequently Pasted URLs regarding Unicode encodings.  Some
may be dead URLs; I haven't confirmed their liveness in a while.

Canonical Equivalence in Applications:
  <http://www.unicode.org/notes/tn5/>
UAX #15: Unicode Normalization:
  <http://www.unicode.org/reports/tr15/>

International Components for Unicode (see: ICU4J) -- 1.4+
  <http://icu.sourceforge.net/>
ICU's Normalization:
  <http://icu.sourceforge.net/userguide/normalization.html>

QA1235: Converting to Precomposed Unicode -- native calls, not Java
  <http://developer.apple.com/qa/qa2001/qa1235.html>

TN1150 HFS+/HFSX format -- find "Canonical Decomposition" on that page:
  <http://developer.apple.com/technotes/tn/tn1150.html>
  <http://developer.apple.com/technotes/tn/tn1150table.html> -- table

-- GG

_______________________________________________ Do not post admin requests to the list. They will be ignored. Java-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Problem with umlauts in file names (From: Greg Guerin <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.