Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug with Text Encoding?



On 23/04/2007, at 9:31 AM, Michael Hall wrote:


On Apr 22, 2007, at 4:58 PM, Greg Guerin wrote:

Michael Hall wrote:

Second I was thinking each String getBytes() would be a direct hex
translation of the Unicode value, which I have verified it does not
seem to be. I'm still not completely sure that makes what I was
suggesting complete nonsense.

I don't know what you mean by "a direct hex translation" of a Unicode value.


If the Unicode value is u0439 I might think the Unicode would have a 0x0439. Or if the escaped value is taken as decimal a value of 439 = 0x01b7. But I tested and neither was the case.

There is no such thing as a "Unicode value" that can be directly represented as a sequence of bytes.


There is a "Unicode code point", which maps a character to a number. The important thing to remember is that a number IS NOT a sequence of bytes. A sequence of bytes can _represent_ a number, but you need to apply rules (even basic hardware-level ones like big- vs little- endianness) to perform that conversion.

Character encodings are rules for converting sequences of unicode numbers into sequences of bytes. AFAIK, there are only two character encodings that map U+0439 to the byte array [ 0x04, 0x39 ] -- big- endian UCS-2 and UTF-16.

Even in the Unicode spec, however, there are many other ways to represent U+0439 as a byte array, depending on the encoding chosen.

Joel Spolsky has a good rundown in the incredibly accurately titled: "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

http://www.joelonsoftware.com/articles/Unicode.html

C
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Bug with Text Encoding? (From: Greg Guerin <email@hidden>)
 >Re: Bug with Text Encoding? (From: Michael Hall <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.