Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with File.exists() and surrogate pairs in filename



Scott Kovatch wrote:

Why is File.exists() failing? The raw file object shows the split UTF-16 values for the second character in the filename (\ud85b, \udff6).


Seems like a bug to me. Exactly what kind of bug depends on which native API File.exists() is actually using. Still, since the names came from File itself, as a listing, the names should work as inputs to File.


This makes me think I need to do some kind of conversion on the filename so it gets turned into properly decomposed UTF-8, ...


I don't know why you think that. A String contains UTF-16 chars, not UTF-8. Depending on the native API used in File.exists(), it may or may not even use UTF8.

If File.exists() does use UTF8, it almost certainly uses one of the String-related JNI functions that returns a String as a UTF8 C- string. So your doing an encode/decode cycle in Java really isn't accomplishing anything. All it proves is that a round-trip works, and you don't need File.exists() for that.

Another possibility is the native API uses UTF-16, e.g. one of the FSRef functions. If so, then UTF8 is never used at all, so encode/ decode cycles are even less relevant.

It's conceivable the JNI function to get UTF8 isn't working right with the surrogate pair. E.g. it could be encoding to CESU-8, not real UTF-8. To test any of this you'll have to dip into JNI, though. Fiddling around at the Java level is at least one step removed from what's really happening.

I'd write a JNI function that takes a String pathname and calls the lstat(2) function, passing it the UTF8 pathname, and returning the node-type (file, dir, symlink, etc.), or -1 on error, to the Java level. I'd write another JNI function that takes a String and uses the FSRef functions to return some integer-representable value to the Java level. Many FSRef functions take names as UTF-16, not UTF-8.

http://en.wikipedia.org/wiki/CESU-8

  -- GG

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.