I thought that I knew the right way to compute the uri of a file,
under a format usable directly in an href tag, and that, given such
an uri, I was able to find back the file.
Until I tried with a file called "ééé.html" (3 e acute). Sort of a
deception, as I had no problem with "éé.html" (2 e acute).
1) Here is a quick way to get a feeling of the problem :
put in a directory 2 files, one called "éé.htm" and the other
"ééé.htm". (Ideally, put them in a place where they can be served by
Apache). On my disk, they are in "/Users/x/Sites/x/2006/08/diacritics
in filenames/"
So, anyone would think (I did, at least): the e acute is represented
by "%C3%A9", just add this before the ".html" to get access to
"ééé.html". Well no:
While Safari displays correctly
file:///Users/x/Sites/x/2006/08/diacritics%20in%20filenames/%C3%A9%C3%
A9%C3%A9.html
Firefox doesn't ("Firefox cannot find file at address /Users/x/Sites/
x/2006/08/diacritics in filenames/ééé.html.)
I am using latest version of system (java 1.5) on a macbook pro
2) How did I compute the uri ?
file.toURI().toString() doesn't make it (returns something like:
file:/Users/x/Sites/x/2006/08/diacritics%20in%20filenames/ok-ko/e?
e?.html
(where the "?" are not quotation marks))
This form doesn't work at all. You have to file.toURI(). toASCIIString
() to get a form without non ASCII chars (as explained in the
javadoc). I prefer using the following to get the file protocol uri,
because it ensures that the result begins with 'file:///" (with 3
"/"), which I need in using Jena:
static public String fileToUri(File file) throws URISyntaxException {
String s = file.getAbsolutePath();
// this is for windoze, "java.net.URISyntaxException: Relative path
in absolute URI" if s doesn't start with a "/"
// if (!s.startsWith("/")) {
s = s.replace('\\','/');
if (!s.startsWith("/")) s = "/" + s;
// }
URI uri = new URI ("file", "", s, null);
return uri.toASCIIString();
}
Note: I take care to pass an existing file to fileToUri (!).
If you write:
File f = new File(dir, "éé.html");
f.exists() returns false.
I get the short finename using dir.list(). I know of the question of
"composed accents". I have been using Greg Guerrin "AccentComposer"
class. An interesting (?) remark:
File dir = new File(dirPath);
String[] names = dir.list();
for (int i = 0; i < names.length; i++) {
File file;
file = new File(dir, names[i]);
System.out.println("file exists: "+ file.exists()); // TRUE
file = new File(dir, AccentComposer.composeAccents(names[i]));
System.out.println("file exists: "+ file.exists()); // TRUE in case
of "éé.html", FALSE in case of "ééé.html"
}
3) Until I found this stange problem, I was very happy with this form
of uris you get when calling toASCIIString: I can use them directly
in href tags (only ASCII chars). And I didn't have problem to get
back to the file with them:
new File(uri) returns the file
Well it does for "éé.html", but not for "ééé.html"
new File(f.toURI().toString() seems to allways return an existing
file (if f exists)
As the uris I am trying to produce are intended to be used in RDF
data (using Jena software, as I mentionned), I am very reluctant to
try to use file.toURI().toString(). I am not sure of the way non
ASCII chars would be handled. Furthermore, this would require
modifying the programs that display the data in html (and make it
more complex), and I already have a lot of data produced using the
ASCII only format for my uris. It seems that the problem shows up
only when 3 "stange" chars, one after the other, are included in the
filename. For instance, "éé éé.html" is no problem.
Any advice and/or comment?
Thanks in advance
fps
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden