Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AWT FileDialog and Unicode



Hello,

I finally got down to testing the problems with Unicode filenames and
AWT FileDialogs I reported some while ago. Here are my findings so far:

- Using java.io.* routines or javax.swing.JFileChooser, composite
characters in filenames (like the German Umlaut d which is \u00e4, or
ä in HTML or \"a in TeX) are displayed as two characters, in this
case an "a" followed by the unicode composing diaresis (\u0309 IIRC).
While this is rather ugly for display (e.g. in the JFileChooser), it
works fine for opening files with FileInputStream.

- Using java.awt.FileDialog those composite characters are displayed
correctly by the peer (this is Carbon based, right?) but
FileDialog.getFile() and FileDialog.getDirectory() return strings where
those composite characters are encoded using UTF8, so they come out as 3
characters, namely an "a" followied by the 2-byte UTF-encoding for the
diaresis.

The problem is, that File.exists() returns false and FileInputStream()
throws a FileNotFoundException when given those UTF strings.

Now here comes the interesting part: I tried to take all those
characters converting them them to raw bytes and then decoding UTF like
this:

String decodeUTF(String s) {
byte bytes[] = new byte[s.length()];

for (int i=0; i<s.length(); i++) {
char c = s.charAt(i);

if (c >= 256)
return null; // not UTF
else if (c < 128)
bytes[i] = (byte) c;
else
bytes[i] = (byte) (c-256);
}

try {
return new String(bytes, "UTF8");
}
catch (UnsupportedExcondingException e) {
return null;
}
}

I was very surprised to find that the result was garbage. In fact a much
simpler version did work:

String decodeUTF(String s) {
byte bytes[] = s.getBytes();

try {
return new String(bytes, "UTF8");
}
catch (UnsupportedExcondingException e) {
return null;
}
}

BUT: String.getBytes() uses the platform CharToByteConverter to convert
the characters to bytes so those characters are not raw UTF8 but Mac
encoded UTF bytes instead. As this works realiably I figured that the
AWT FileDialog gets the UTF encoded filename from carbon, and instead of
creating a string by decoding UTF8 it creates the string using the Mac
ByteToCharConverter.

So there is one bug and one cosmetic problem:

First the Bug: java.awt.FileDialog does not properly create a unicode
string from the native UTF representation but instead converts the UTF
bytes using the Mac ByteToCharConverter. The workaround is to use
new String(file.getPath().getBytes(), "UTF8")
but in order to be compatible with other platforms this should not be
necessary.

Second the cosmetic problem: When displaying those filenames it is
rather ugly to see all composite characters displayed as two characters.
As several components of the system (like the Finder) manage to display
those composite characters as one character, it would be nice if Java
could do that to, either by converting the composite form into one
character or by correctly drawing them one over the other. The MacOS X
Finder seems to do just this, as it is possible to type an "d" (&auml)
and then select and delete only half of it (leaving just the a or ujst
the ").

How does MRJ 2.2.4 do this? It works just fine with AWT FileDialog on
MacOS 9.1 even though it uses the native HFS+ APIs (and these are the
ones that give the UTF encoded names, right?!).


I already filed bug #2681902 on this but it's still Analyze/Unresolved.
Should I file another bug report with my additional findings (refering
to the first report) or can I somehow append the details to the existing
report?

Daniel

--
------------------------------------------------------------------------
Daniel Bobbert telefon: (0681) 3 90 81 98
Richard Wagner Str. 77 email: email@hidden
66111 Saarbruecken www: http://www.coli.uni-sb.de/~dabo/
------------------------------------------------------------------------




Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.