Second I was thinking each String getBytes() would be a direct hex
translation of the Unicode value, which I have verified it does not
seem to be. I'm still not completely sure that makes what I was
suggesting complete nonsense.
I don't know what you mean by "a direct hex translation" of a
Unicode value.
If the Unicode value is u0439 I might think the Unicode would have
a 0x0439. Or if the escaped value is taken as decimal a value of
439 = 0x01b7. But I tested and neither was the case.
There is no such thing as a "Unicode value" that can be directly
represented as a sequence of bytes.
There is a "Unicode code point", which maps a character to a number.
The important thing to remember is that a number IS NOT a sequence of
bytes. A sequence of bytes can _represent_ a number, but you need to
apply rules (even basic hardware-level ones like big- vs little-
endianness) to perform that conversion.
Character encodings are rules for converting sequences of unicode
numbers into sequences of bytes. AFAIK, there are only two character
encodings that map U+0439 to the byte array [ 0x04, 0x39 ] -- big-
endian UCS-2 and UTF-16.
Even in the Unicode spec, however, there are many other ways to
represent U+0439 as a byte array, depending on the encoding chosen.
Joel Spolsky has a good rundown in the incredibly accurately titled:
"The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)"
C
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden