Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AWT FileDialog and Unicode



On 14 May 2001 Daniel Bobbert <email@hidden> wrote:

>I finally got down to testing the problems with Unicode filenames and
>AWT FileDialogs I reported some while ago. Here are my findings so far:
>[...]
>First the Bug: java.awt.FileDialog does not properly create a unicode
>string from the native UTF representation but instead converts the UTF
>bytes using the Mac ByteToCharConverter. The workaround is to use
> new String(file.getPath().getBytes(), "UTF8")
>but in order to be compatible with other platforms this should not be
>necessary.

I just finished my own tests, and this workaround only works some of the
time. There are filenames that contain ISO Latin-1 characters (i.e.
UniCode range \u0080-\u00FF) that will cause this workaround to fail. This
appears to happen when something even deeper in the bowels of FileDialog
fails.

The failure cases all appear to contain UniCode Latin-1 characters which
cannot be translated to MacRoman. Since you typically can't type these
characters on a Mac keyboard, my test-cases were generated
programmatically. Basically, I wrote a loop that generated file-names
containing UniCode characters between \u00A0 and \u00FF, assembling only 4
chars into one name at a time and prefixing with a known good filename
component. For example, the pattern produce the following filename for the
4 characters \u00BC - \u00BF:
U.00BC.xxxx

The x's are actually the 4 UniCode characters for the glyphs:
<1/4> <1/2> <3/4> <inverted-?>

Astute readers will immediately recognize that there is no MacRoman
character for <1/4>. This appears to be the root of the problem, though I
do not have a coherent explanation as to why this should be so. The
testing was done on an HFS+ volume, which is supposed to store UniCode on
the disk. Nonetheless, when you choose the file in a FileDialog, it's name
is returned as:
U.00BC.#15056

Notice that there is no "MacRomanized UTF-8" at all in this string. This
is the actual literal string returned by FileDialog, without any attempt to
run it through Daniel's workaround above. (No, I don't know what the
"#15056" means. It's not anything UniCodey or UTF-8ish.) If there had
been some MacRoman-representable Latin-1 characters before the <1/4> char,
e.g. E-acute, they WOULD have suffered the "MacRomanized UTF-8" mangling.
In any case, everything past the first problematic character will be
truncated from the returned filename. It looks like some translation
algorithm is detecting a failure condition and just stops where it's at
after putting some failure-code into the string.

Based on the above, which I will file in a new bug report, I'd say that
FileDialog is hopelessly broken in MRJ 3.0. To some extent, one can take a
shot using Daniel's workaround, but it is not certain you will succeed.
Worse, you can't tell whether you've succeeded or failed, since the
FileDialog name could have been mangled before it ever underwent the second
mangling into "MacRomanized UTF-8".

-- GG




Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.