Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Weird problems with Charset of Files



hallo,

i have a problem with pathname strings from File objects. they appear to be in a weird encoding which results in me being unable to transcode them to other charsets and transfer them using OpenSoundControl. the problem arises with characters not in the lower 7 bit of standard ascii, for example umlauts. like the following:

java.awt.FileDialog fd = new java.awt.FileDialog( new java.awt.Frame () );
fd.show();
final String fileName = fd.getFile();
final java.nio.ByteBuffer bb3 = java.nio.ByteBuffer.wrap ( fileName.getBytes() );
printHexOn( System.out, bb3 );


... running this and choosing a file 'รค.aif' (ä.aif) from the finder results in:

0000  61 3f 2e 61 69 66                                 |a?.aif|

even the length of the string is 6 not 5. the system default charset is MacRoman, so the a-umlaut should come out as 0x8A . if i use a regular java string "\u00E4.aif", it gets converted correctly:

final java.nio.ByteBuffer bb4 = java.nio.ByteBuffer.wrap ( "\u00E4.aif".getBytes() );
printHexOn( System.out, bb4 );


... outputs:

0000  8a 2e 61 69 66                                    |..aif|

strangely, i can store the files in my preferences (using java.util.prefs.Preferences and the values are retrieved using file.getAbsolutePath()) and recall them without problems. i even can create File objects again from these preferences and i can open the files (only i cannot transmit the filename over network as i cannot convert to one of the 8-bit charsets). also if i display the file names in a JTextField, the umlauts are shown correctly.

in fact, if i dump my plist file, the strings are either stored with one byte per character if no umlauts are contained, or in the "weird" unknown encoding with two bytes per character - probably UTF-16, but with a wrong encoding of a-umlaut which is represented as four bytes 61 03 08 00 :

00000ba0 2f 00 55 00 73 00 65 00 72 00 73 00 2f 00 72 00 |/.U.s.e.r.s./.r.|
00000bb0 75 00 74 00 7a 00 2f 00 44 00 65 00 73 00 6b 00 | u.t.z./.D.e.s.k.|
00000bc0 74 00 6f 00 70 00 2f 00 61 03 08 00 2e 00 61 00 | t.o.p./.a.....a.|
00000bd0 69 00 66 00 3a 00 2f 00 55 00 73 00 65 00 72 00 | i.f.:./.U.s.e.r.|


i tired to use specific charsets for decoding (like fileName.getBytes ( "ISO-8859-1" ), fileName.getBytes( "UTF-8", ...), no luck, i always end up with umlauts appearing as two bytes a-?, o-?, u-? etc.

is this a bug of the apple implementation of File ? what is that weird character set that represents a-umlaut as (char) 0x61 + (char) 0x308 and why can't i transcode it ? Is there a workaround to eventually get iso-latin-1 or utf-8 or whatever strings which preserve my umlauts from File objects?

system is mac os x 10.4.7 /   Java 1.5.0_06

thanks, -sciss-


p.s.: here is the method for hexdump output:


private static final byte[] hex = { 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66 };


	public static void printHexOn( PrintStream stream, ByteBuffer b )
	{
		final int		lim	= b.limit(); // len = b.limit() - off;
		final byte[]	txt	= new byte[ 74 ];
		int				i, j, k, n, m;

		for( i = 4; i < 56; i++ ) {
			txt[ i ] = (byte) 0x20;
        }
		txt[ 56 ] = (byte) 0x7C;
		
		stream.println();
		for( i = (int) b.position(); i < lim; ) {
			j = 0;
			txt[ j++ ]	= hex[ (i >> 12) & 0xF ];
			txt[ j++ ]	= hex[ (i >> 8) & 0xF ];
			txt[ j++ ]	= hex[ (i >> 4) & 0xF ];
			txt[ j++ ]	= hex[ i & 0xF ];
			m = 57;
			for( k = 0; k < 16 && i < lim; k++, i++ ) {
				if( (k & 7) == 0 ) j += 2; else j++;
				n			= b.get();
				txt[ j++ ]	= hex[ (n >> 4) & 0xF ];
				txt[ j++ ]	= hex[ n & 0xF ];
				txt[ m++ ]	= (n > 0x1F) && (n < 0x7F) ? (byte) n : (byte) 0x2E;
			}
			txt[ m++ ] = (byte) 0x7C;
			while( j < 54 ) {
				txt[ j++ ] = (byte) 0x20;
			}
			while( m < 74 ) {
				txt[ m++ ] = (byte) 0x20;
			}
			stream.write( txt, 0, 74 );
			stream.println();
        }
		stream.println();
    }

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.