Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Weird problems with Charset of Files



werner,

thanks a lot for your help + effort! that really solved my problem! i came out with this (using the unicode website's sourcecode as i need to compatible with java 1.4):

java.awt.FileDialog fd = new java.awt.FileDialog( new java.awt.Frame () );
fd.show();
final String fileName = fd.getFile();
final java.nio.ByteBuffer bb3 = java.nio.ByteBuffer.wrap ( fileName.getBytes() ); // or "ISO-8859-1"
System.err.println( "Source:\n" );
printHexOn( System.out, bb3 );
final Normalizer n = new Normalizer( Normalizer.C, false );
final StringBuffer sb = new StringBuffer();
n.normalize( fileName, sb );
final java.nio.ByteBuffer bb4 = java.nio.ByteBuffer.wrap ( sb.toString().getBytes() ); // or "ISO-8859-1"
System.err.println( "Normalized to NFC:\n" );
printHexOn( System.out, bb4 );
System.exit( 0 );


so i assume that the normalizer will work also on windows and linux or filesystems other than HFS+, so i really needn't do more?

for completedness of the links, here is the one for the IBM classes needed by Normalizer:

http://icu.sourceforge.net/download/3.4.html#ICU4J

kind of painfull to blow up my app by more than 3 MB just to do this simple conversion thing. maybe one can strip down the ICU4J package to include only those classes really needed by Normalizer. ...

best,  -sciss-


Am 17.09.2006 um 17:42 schrieb Werner Randelshofer:

Hi Sciss,

Mac OS X stores file names as canonically decomposed character sequences (Unicode Normalization Form D).
In a decomposed characters sequence, the character "a umlaut" รค is stored as two unicode code points.
Java usually works with composed Unicode character sequences which stores "a umlaut" in a single code point.


For details see:
http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html

For a lot more details see:
http://developer.apple.com/technotes/tn2002/tn2078.html#HowEncoded
http://developer.apple.com/documentation/MacOSX/Conceptual/ BPInternational/Articles/FileEncodings.html
http://www.unicode.org/reports/tr15/index.html


With J2SE6 you can convert between normalization forms using class java.text.Normalizer.
To do the normalization with earlier Java versions, you can use the Normalizer classes available from the Unicode consortium:
http://www.unicode.org/reports/tr15/Normalizer.html


With best regards,
Werner



On 17.09.2006, at 16:40, Sciss wrote:
i have a problem with pathname strings from File objects. they appear to be in a weird encoding which results in me being unable to transcode them to other charsets and transfer them using OpenSoundControl. the problem arises with characters not in the lower 7 bit of standard ascii, for example umlauts. like the following:


_______________________________________________ Do not post admin requests to the list. They will be ignored. Java-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden
References: 
 >Weird problems with Charset of Files (From: Sciss <email@hidden>)
 >Re: Weird problems with Charset of Files (From: Werner Randelshofer <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.