Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug with Text Encoding?




On Apr 22, 2007, at 4:58 PM, Greg Guerin wrote:

Michael Hall wrote:

Second I was thinking each String getBytes() would be a direct hex
translation of the Unicode value, which I have verified it does not
seem to be. I'm still not completely sure that makes what I was
suggesting complete nonsense.

I don't know what you mean by "a direct hex translation" of a Unicode value.


If the Unicode value is u0439 I might think the Unicode would have a 0x0439. Or if the escaped value is taken as decimal a value of 439 = 0x01b7. But I tested and neither was the case.


Does java String.getBytes("\u0439") always produce the same hex value
- whatever it might be?

It doesn't: what it returns depends on what the default encoding is. Read
The Fine Manual section describing String.getBytes().


Or specified encoding. I was thinking providing an encoding on the constructor didn't do any actual conversion to the bytes. But I did check the javadoc and sometimes it sounds like it does.

And I suspect you meant this:
  "\u0439".getBytes()

Yeah.



True. But you'd have to define SuperHybridLatinCyrllicCharSet as the context.

There is nothing magical about charsets or encodings.  Unicode is an
encoding.  Even binary is an encoding.  Contrast 2's-complement,
1's-complement, sign-magnitude, and BCD; they are all "binary", but
arithmetic is different in each one.

A bit-pattern only has meaning given an interpretation, aka an encoding
that tells you what the patterns mean.


Remembering back to when I did a little of this I think it may be more difficult than indicated or you would also need a SuperHybrid font. I think the byte converters sometimes fixed offset subtract back to a font mapping. From my understanding you are not going to want a single font for every possible Unicode glyph.
So composites probably are mostly nonsense.



However if it always produces the same hex bytes then you could vary
the encoding and it _might_ sense.

I try not to write code that only _might_ make sense.

Sometimes thats why you test, which is what I was doing.


Basically rightly or wrongly you are claiming the encoding handles
the string.

No I'm not, because it doesn't. Read The Fine Source and see for yourself.


Actually I did check the javadoc and I thought it went further then I thought and did mean some byte conversion at least might happen. Sorry about the _might_ word again. But I'm already conceding there are definitely encodings that make no sense for some unicode strings, like latin for cyrillic.

Specifying the encoding does not itself do any byte
conversions in the String constructor.

False. The encoding, either a named one or the default, leads to a
converter object. That object converts the bytes you give it into Java
chars, which are a primitive type with a 16-bit size and Unicode encoding.
Those 16-bit Java chars are then arranged in an array or sequence, with
which it creates a String.


OK, that is what the javadoc said.


I couldn't comment on the IDEA of what your code was trying to do, because
I couldn't tell what that idea was. All I could comment on was whether it
was doing the correct things, given the context. And the answer to that
was "No."


Uh ok, I guess I should comment. I thought that the code was trying some of the discussed string values with a set of the possibly involved character sets would be sort of clear.


Not sure on the PrintStream stuff. You're probably right that it all
depends on file.encoding.

Again, RTFM, or even RTFS included with Apple's Java Developer downloads.
Or decompile it using 'javap' on PrintStream.


Actually this one I had second thoughts on that might rate additional testing if I was curious enough. For Terminal what you say regarding PrintStream would be pertinent. My original test appeared to never work for that, even when System Preferences should of set things to MacCyrillic. It might be that I had Terminal running before changing settings and it doesn't refresh or something like that. When going to the Text component I am of course redirecting I/O and the PrintStream.println() is probably actually getting switched to a widget.appendText(). The application I was starting and stopping after changing preferences.


Mike Hall mikehall at spacestar dot net http://www.spacestar.net/users/mikehall http://sourceforge.net/projects/macnative



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Bug with Text Encoding? (From: Greg Guerin <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.