Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug with Text Encoding?




On Apr 22, 2007, at 11:55 AM, Greg Guerin wrote:



I don't know what you expect this to do, but at least of what it's doing
seems like nonsense to me.


Ah, well I hadn't looked at much of anything like this for a couple years so I was off on a couple important points as you mention.

First that changing the System Preference altered the file.encoding property. I assumed the text gui encoding would be separate. It is better probably to have a global you can control from java. - dfile.encoding=MacCryllic what the OP probably wants (+normalized).

Second I was thinking each String getBytes() would be a direct hex translation of the Unicode value, which I have verified it does not seem to be. I'm still not completely sure that makes what I was suggesting complete nonsense.

Does java String.getBytes("\u0439") always produce the same hex value - whatever it might be?
If it doesn't then what I assumed was in fact complete nonsense and every Unicode string has to correspond to one and only one character encoding that knows how to handle it correctly.
However if it always produces the same hex bytes then you could vary the encoding and it _might_ sense.
You could have
String mac_cyrllic = new String("\u0439".getBytes (),"MacCryllic")); // and it would work
or
String other_cyrllic_charset = new String("\u0439".getBytes (),"OtherCryllicCharSet")); // and it would also work


Bad assumption on the direct mapping. However, not complete nonsense in that then...
new String("cyrllic","SuperHybridLatinCyrllicCharSet"); // might be possible and not complete nonsense.


Basically rightly or wrongly you are claiming the encoding handles the string. Specifying the encoding does not itself do any byte conversions in the String constructor.

I guess I was thinking maybe MacRoman was sort of filling this SuperHybrid function on the OS X platform. Again I missed the switch to MacCryllic so I was in fact wrong. But being mistaken does not make the idea nonsense. Although there may very well be other considerations that make composite Latin+Cyrllic character sets _nonsense_.

Second, System.out is a PrintStream. A PrintStream converts Unicode chars
to bytes using the default encoding, which is MacCyrillic.

Not sure on the PrintStream stuff. You're probably right that it all depends on file.encoding. However, I was thinking the String constructor encoding might set a field for the String instance that OutputStream's/gui components could use eventually in displaying a glyph.



Mike Hall mikehall at spacestar dot net http://www.spacestar.net/users/mikehall http://sourceforge.net/projects/macnative



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Bug with Text Encoding? (From: Greg Guerin <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.