Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with File.exists() and surrogate pairs in filename




On Nov 4, 2009, at 4:16 PM, Greg Guerin wrote:

Scott Kovatch wrote:

Why is File.exists() failing? The raw file object shows the split  
UTF-16 values for the second character in the filename (\ud85b,  
\udff6).


Seems like a bug to me.  Exactly what kind of bug depends on which  
native API File.exists() is actually using.  Still, since the names  
came from File itself, as a listing, the names should work as inputs  
to File.


This makes me think I need to do some kind of conversion on the  
filename so it gets turned into properly decomposed UTF-8, ...

I don't know why you think that.  A String contains UTF-16 chars, not  
UTF-8.  Depending on the native API used in File.exists(), it may or  
may not even use UTF8.

While I can't see the full implementation of exists(), I can see that it eventually passes the File object into native code in UnixFileSystem. I also know that the file system represents file names in UTF-8. NSFileManager's fileSystemRepresentation effectively does this for you, but calling that via JNI (or, in my case, Eclipse SWT's OS layer) gave me the same results as calling String.getBytes("UTF-8"), so something else is going on.

Another possibility is the native API uses UTF-16, e.g. one of the  
FSRef functions.  If so, then UTF8 is never used at all, so encode/ 
decode cycles are even less relevant.

It's conceivable the JNI function to get UTF8 isn't working right  
with the surrogate pair.  E.g. it could be encoding to CESU-8, not  
real UTF-8.  To test any of this you'll have to dip into JNI,  
though.  Fiddling around at the Java level is at least one step  
removed from what's really happening.

<rant>
That's what makes this so aggravating. The fact that the path is a Java String, complete with UTF-16 encoding, certainly is complicating matters. If I want to display the filename to the user or do something other than file I/O, yes, I probably need to convert the file name to some other encoding. But for ACTUALLY WORKING WITH A FILE I should neither need to know or care what the encoding is. Like Python does....
</rant>

I'd write a JNI function that takes a String pathname and calls the  
lstat(2) function, passing it the UTF8 pathname, and returning the  
node-type (file, dir, symlink, etc.), or -1 on error, to the Java  
level.  I'd write another JNI function that takes a String and uses  
the FSRef functions to return some integer-representable value to the  
Java level.  Many FSRef functions take names as UTF-16, not UTF-8.

FSRefs may be the way to go. There's other code in the SWT that boils file names down to FSRefs, so I should be able to adapt that code for my needs. Figuring out how to make this file name work with exists() may turn out to be an interesting side-effect, because what I really need to do is read the file. If I have to use native code to accomplish that, so be it.

I'll post the results here, because this seems like a common-enough problem that someone else would encounter.

-- Scott

----------
Scott Kovatch
Flex Engineering

I am Scott Kovatch, and I approve this message.

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden

References: 
 >Re: Problems with File.exists() and surrogate pairs in filename (From: Greg Guerin <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.