Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

file to uri to file madness



I thought that I knew the right way to compute the uri of a file, under a format usable directly in an href tag, and that, given such an uri, I was able to find back the file.

Until I tried with a file called "ééé.html" (3 e acute). Sort of a deception, as I had no problem with "éé.html" (2 e acute).

1) Here is a quick way to get a feeling of the problem :
put in a directory 2 files, one called "éé.htm" and the other "ééé.htm". (Ideally, put them in a place where they can be served by Apache). On my disk, they are in "/Users/x/Sites/x/2006/08/diacritics in filenames/"


When I paste any of these two urls in Safari or Firefox, the file "éé.htm" is displayed :
file:///Users/x/Sites/x/2006/08/diacritics%20in%20filenames/%C3%A9%C3% A9.html
http://127.0.0.1/~x/x/2006/08/diacritics%20in%20filenames/%C3%A9%C3% A9.html


So, anyone would think (I did, at least): the e acute is represented by "%C3%A9", just add this before the ".html" to get access to "ééé.html". Well no:

While Safari displays correctly
file:///Users/x/Sites/x/2006/08/diacritics%20in%20filenames/%C3%A9%C3% A9%C3%A9.html
Firefox doesn't ("Firefox cannot find file at address /Users/x/Sites/ x/2006/08/diacritics in filenames/ééé.html.)


Trying to access the http form returns apache's error page:
Not Found
The requested URL /~x/x/2006/08/diacritics in filenames/ééé.html was not found on this server.


I am using latest version of system (java 1.5) on a macbook pro

2) How did I compute the uri ?

file.toURI().toString() doesn't make it (returns something like:
file:/Users/x/Sites/x/2006/08/diacritics%20in%20filenames/ok-ko/e? e?.html
(where the "?" are not quotation marks))
This form doesn't work at all. You have to file.toURI(). toASCIIString () to get a form without non ASCII chars (as explained in the javadoc). I prefer using the following to get the file protocol uri, because it ensures that the result begins with 'file:///" (with 3 "/"), which I need in using Jena:


static public String fileToUri(File file) throws URISyntaxException {
String s = file.getAbsolutePath();
// this is for windoze, "java.net.URISyntaxException: Relative path in absolute URI" if s doesn't start with a "/"
// if (!s.startsWith("/")) {
s = s.replace('\\','/');
if (!s.startsWith("/")) s = "/" + s;
// }
URI uri = new URI ("file", "", s, null);
return uri.toASCIIString();
}


Note: I take care to pass an existing file to fileToUri (!).
If you write:
File f = new File(dir, "éé.html");
f.exists() returns false.
I get the short finename using dir.list(). I know of the question of "composed accents". I have been using Greg Guerrin "AccentComposer" class. An interesting (?) remark:


File dir = new File(dirPath);
String[] names = dir.list();
for (int i = 0; i < names.length; i++) {
File file;
file = new File(dir, names[i]);
System.out.println("file exists: "+ file.exists()); // TRUE
file = new File(dir, AccentComposer.composeAccents(names[i]));
System.out.println("file exists: "+ file.exists()); // TRUE in case of "éé.html", FALSE in case of "ééé.html"
}


3) Until I found this stange problem, I was very happy with this form of uris you get when calling toASCIIString: I can use them directly in href tags (only ASCII chars). And I didn't have problem to get back to the file with them:
new File(uri) returns the file
Well it does for "éé.html", but not for "ééé.html"


new File(f.toURI().toString() seems to allways return an existing file (if f exists)

As the uris I am trying to produce are intended to be used in RDF data (using Jena software, as I mentionned), I am very reluctant to try to use file.toURI().toString(). I am not sure of the way non ASCII chars would be handled. Furthermore, this would require modifying the programs that display the data in html (and make it more complex), and I already have a lot of data produced using the ASCII only format for my uris. It seems that the problem shows up only when 3 "stange" chars, one after the other, are included in the filename. For instance, "éé éé.html" is no problem.

Any advice and/or comment?

Thanks in advance

fps



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.