Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: file to uri to file madness



Michael Macaluso wrote:

>IMHO, this is a bug in the VM in that it forces you to decompose the
>unicode characters.

I'm sure this is a bug, but I'm not sure it's in the JVM.  It could be
anywhere along the underlying native code path: a native method, a library
function, the file-system, the kernel.

The result I saw is that Apache (via personal web sharing) fails with 3
composed e-acutes in a row, and it isn't using the JVM at all.  So assuming
that Apache is de-escaping %xx correctly, which is reasonable given that
Apache works on other platforms, my best guess is that the bug is actually
in an Apple library function, framework, or something below that, and not
in the JVM native libs at all.  Or I could be wrong.

As a test, write the equivalent fail-case code in C, C++, Obj-C, calling
the BSD open(2) function or the stdio fopen(3) function.  Also write the
equivalent code in Python, Perl, or other languages that use the
file-system frameworks.

Another test is to use my MacBinary Toolkit to open or test for file
existence, instead of File.
  <http://www.amug.org/~glguerin/sw/#macbinary>

The native code of MB Toolkit calls the FSRef C functions, so you can tell
exactly what lies along the entire path between the Java code and Apple's
framework (an advantage to having source: auditability).

If the MB Toolkit test works, then it indicates a problem in the JVM and
not in the FSRef lib or anything below it.  If it fails the same way, it
indicates a problem in the FSRef library functions, or somewhere below that
point.  Combined with tests written in C for open() or fopen(), one may be
able to narrow down where the problem is, and file an appropriate
bug-report.


>If Apple chooses to require the decomposed form in the C APIs, fine, ...

But they don't.  You can use the composed form in the C APIs, and it's
supposed to work.  The decomposed form is also supposed to work.  AFAIK
there's no rule that says you have to be consistent either: you can mix
composed and decomposed forms in the same name-string.  I'm almost certain
you can even apply a decomposed accent to a composed-accent character, if
the combination makes sense.

The distinction between composed and decomposed is basically telling
programmers that a directory listing will return decomposed form.

QA1235: Converting to Precomposed Unicode -- native calls, not Java
  <http://developer.apple.com/qa/qa2001/qa1235.html>

TN1150 HFS+/HFSX format -- find "Canonical Decomposition" on that page:
  <http://developer.apple.com/technotes/tn/tn1150.html>
  <http://developer.apple.com/technotes/tn/tn1150table.html> -- table

  -- GG


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.