Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AWT FileDialog and Unicode



Hello Greg,

regarding the FileDialog bugs #2699436 filed by you and #2681902 filed
by me, I think it is rather obvious what is happening here:

- the java.awt.FileDialog peer returns the UTF8 filename from HFS+
- now here comes the bug: instead of decoding UTF8, MRJ on X pushes
those raw UTF8 bytes through MacRoman.

This leads to the following consequences:
- if the filename contains only characters that are representable by
MacRoman, then my workaround can be used to reverse the MacRoman
encoding and then corectly decode the raw UTF8 bytes.
- if the filename contains chars, that are not representable by
MacRoman, TEC obviously produces a numeric representation from which we
dont know what it means (#15056 doent look like four characters to me
though). Anyhow in this case there doese not seem to be a way to reverse
the conversion.

It looks to me, as if it should be as simple as to find the line in the
FileDialog peer that sends the raw bytes through MacRoman and replace it
with a line that sends them through an UTF8 decoder.

Or am I missing something?

Daniel




Original message from Greg Guerin:
>
> On 14 May 2001 Daniel Bobbert <email@hidden> wrote:
>
> >I finally got down to testing the problems with Unicode filenames and
> >AWT FileDialogs I reported some while ago. Here are my findings so far:
> >[...]
> >First the Bug: java.awt.FileDialog does not properly create a unicode
> >string from the native UTF representation but instead converts the UTF
> >bytes using the Mac ByteToCharConverter. The workaround is to use
> > new String(file.getPath().getBytes(), "UTF8")
> >but in order to be compatible with other platforms this should not be
> >necessary.
>
> I just finished my own tests, and this workaround only works some of the
> time. There are filenames that contain ISO Latin-1 characters (i.e.
> UniCode range \u0080-\u00FF) that will cause this workaround to fail. This
> appears to happen when something even deeper in the bowels of FileDialog
> fails.
>
> The failure cases all appear to contain UniCode Latin-1 characters which
> cannot be translated to MacRoman. Since you typically can't type these
> characters on a Mac keyboard, my test-cases were generated
> programmatically. Basically, I wrote a loop that generated file-names
> containing UniCode characters between \u00A0 and \u00FF, assembling only 4
> chars into one name at a time and prefixing with a known good filename
> component. For example, the pattern produce the following filename for the
> 4 characters \u00BC - \u00BF:
> U.00BC.xxxx
>
> The x's are actually the 4 UniCode characters for the glyphs:
> <1/4> <1/2> <3/4> <inverted-?>
>
> Astute readers will immediately recognize that there is no MacRoman
> character for <1/4>. This appears to be the root of the problem, though I
> do not have a coherent explanation as to why this should be so. The
> testing was done on an HFS+ volume, which is supposed to store UniCode on
> the disk. Nonetheless, when you choose the file in a FileDialog, it's name
> is returned as:
> U.00BC.#15056
>
> Notice that there is no "MacRomanized UTF-8" at all in this string. This
> is the actual literal string returned by FileDialog, without any attempt to
> run it through Daniel's workaround above. (No, I don't know what the
> "#15056" means. It's not anything UniCodey or UTF-8ish.) If there had
> been some MacRoman-representable Latin-1 characters before the <1/4> char,
> e.g. E-acute, they WOULD have suffered the "MacRomanized UTF-8" mangling.
> In any case, everything past the first problematic character will be
> truncated from the returned filename. It looks like some translation
> algorithm is detecting a failure condition and just stops where it's at
> after putting some failure-code into the string.
>
> Based on the above, which I will file in a new bug report, I'd say that
> FileDialog is hopelessly broken in MRJ 3.0. To some extent, one can take a
> shot using Daniel's workaround, but it is not certain you will succeed.
> Worse, you can't tell whether you've succeeded or failed, since the
> FileDialog name could have been mangled before it ever underwent the second
> mangling into "MacRomanized UTF-8".
>
> -- GG

--
--------------------------------------------------------------------------
Daniel Bobbert telefon: (0681) 3 90 81 98
Richard Wagner Str. 77 email: email@hidden
66111 Saarbruecken www: http://www.coli.uni-sb.de/~dabo
--------------------------------------------------------------------------


References: 
 >Re: AWT FileDialog and Unicode (From: Greg Guerin <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.