Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with File.exists() and surrogate pairs in filename



Without having any rationale for why this would change/fix it, try printing f.getCanonicalFile().exists().  Does it still fail?

Sam

On Wed, Nov 4, 2009 at 1:23 PM, Scott Kovatch <email@hidden> wrote:
Hi,

We're having some problems doing file operations in Java with filenames that have surrogate pairs.

When I run this program:

import java.io.*;
import java.nio.*;
import java.nio.charset.*;

public class SurrogateTesting {
/**
* @param args
* @throws IOException 
*/
public static void main(String[] args) throws IOException {
System.out.println("Default charset = " + Charset.defaultCharset().name());
java.awt.FileDialog fd = new java.awt.FileDialog( new java.awt.Frame () );
fd.setVisible(true);
String directory = fd.getDirectory();

File directoryObj = new File(directory);
File children[] = directoryObj.listFiles();
for (File f : children) {
bb1 = java.nio.ByteBuffer.wrap ( f.getName().getBytes("UTF-8") );
System.out.println( "Filename:" );
// printHexOn( System.out, bb1 );

System.out.println("file = " + f + ", " + f.exists());
}
}
}

(I removed the printHexOn method, as it's not relevant to the problem.)

and select a file from a directory with filenames containing surrogate pairs, I get this output.

----------
Default charset = MacRoman

Filename:

0000  e8 8d 89 f0 a6 bf b6 e9  b7 97 e5 a4 96 2e 67 69  |..............gi|
0010  66                                                |f|               

file = /Users/skovatch/Downloads/surrogates/????.gif, false
----------

The byte output looks correct. I did a test in Python and saw that I was getting the same UTF-8 values for the filename. Why is File.exists() failing? The raw file object shows the split UTF-16 values for the second character in the filename (\ud85b, \udff6). This makes me think I need to do some kind of conversion on the filename so it gets turned into properly decomposed UTF-8, but if I make a new string from the UTF-8 interpretation of the bytes using

String UTF8Filename = new String(fileName.getBytes("UTF-8"), "UTF-8");

File.exists still fails.  I also tried calling fileSystemRepresentation and bringing the characters back into Java, but it didn't look any different.

I'm hoping the collective wisdom of this list has dealt with this problem before, as it seems like a common thing to do. Am I off on my assumption that exists() should be working in this case?  I haven't yet tried reading the file with an InputStream or SWT Image - that's probably my next try.

Thanks,
Scott K.

----------
Scott Kovatch
Flex Engineering

I am Scott Kovatch, and I approve this message.


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden

References: 
 >Problems with File.exists() and surrogate pairs in filename (From: Scott Kovatch <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.