We're having some problems doing file operations in Java with filenames that have surrogate pairs.
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
public class SurrogateTesting {
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
System.out.println("Default charset = " + Charset.defaultCharset().name());
java.awt.FileDialog fd = new java.awt.FileDialog( new java.awt.Frame () );
fd.setVisible(true);
String directory = fd.getDirectory();
File directoryObj = new File(directory);
File children[] = directoryObj.listFiles();
for (File f : children) {
bb1 = java.nio.ByteBuffer.wrap ( f.getName().getBytes("UTF-8") );
System.out.println( "Filename:" );
// printHexOn( System.out, bb1 );
System.out.println("file = " + f + ", " + f.exists());
}
}
}
(I removed the printHexOn method, as it's not relevant to the problem.)
and select a file from a directory with filenames containing surrogate pairs, I get this output.
----------
Default charset = MacRoman
Filename:
0000 e8 8d 89 f0 a6 bf b6 e9 b7 97 e5 a4 96 2e 67 69 |..............gi|
0010 66 |f|
file = /Users/skovatch/Downloads/surrogates/????.gif, false
----------
The byte output looks correct. I did a test in Python and saw that I was getting the same UTF-8 values for the filename. Why is File.exists() failing? The raw file object shows the split UTF-16 values for the second character in the filename (\ud85b, \udff6). This makes me think I need to do some kind of conversion on the filename so it gets turned into properly decomposed UTF-8, but if I make a new string from the UTF-8 interpretation of the bytes using
File.exists still fails. I also tried calling fileSystemRepresentation and bringing the characters back into Java, but it didn't look any different.
I'm hoping the collective wisdom of this list has dealt with this problem before, as it seems like a common thing to do. Am I off on my assumption that exists() should be working in this case? I haven't yet tried reading the file with an InputStream or SWT Image - that's probably my next try.
Scott K.