Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug with Text Encoding?



Michael Hall wrote:

>Specifying -Dfile.encoding=MacCyrillic still did not show correct
>glyphs but did give correct hex? I'm assuming correct.
>"A??C" as hex = 41e83f43
>"A?C" as hex = 41e843
>"A?C" as hex = 41e943

Why assume?  Google is your friend:
  <http://www.google.com/search?q=macCyrillic>


>Tweaking the invocation to
>exec java -Dfile.encoding=MacCyrillic -cp /Users/mjh/devstuff/java/
>Combined_App/ext/test.jar TestEnc2
>oddly now shows incorrect glyphs?

Incorrect in which context, the encoding of the exec'ed Java process, or
the display of the data by the parent process?

If the Cyrillic glyphs aren't showing correctly, then that's because the
parent process doesn't know the bytes are encoded in MacCyrillic.  It
doesn't know the bytes are Unicode or Windows-125 either.  The bytes are
just bytes until you tell the process how it should interpret them.  The
code can't read your mind, and it can't infer things you don't tell it.


>41e83f43
>"Aˆã?C" as hex = 41e83f43
>041e843
>"AˆãC" as hex = 41e843
>041e943
>"AˆàC" as hex = 41e943
>
>So possibly something does get jumbled in the Runtime piped output
>stream?

Why do you think something is getting jumbled?

It is inevitable and unavoidable that you'll get a 0x3F ('?'), because
MacCyrillic has no code that represents the combining accent \u0306.  That
was how this whole exercise started.  The data is getting mangled entirely
in the exec'ed java process.  It isn't getting mangled flowing between
processes.

Use -Dfile.encoding=UnicodeBig or -Dfile.encoding=UTF8 and you should get
back an unmangled encoded byte-stream.  This happens, of course, because
both those encodings have a code point for the combining accent mark.


>Anyhow, it at least appears that on it's own Swing text components
>work independently of the file.encoding setting and without a i/o
>stream involved manage to magically work quite well for glyph
>display.

It isn't magic.  The Swing text components aren't converting the chars to
bytes.  They render the actual 16-bit Java chars.

This should be more than obvious, and I can't see how this discussion keeps
running into confusion over the difference between bytes in some
byte-encoding and 16-bit chars in Unicode.  They are completely different
representations.  They are as fundamentally distinct as a text string
"14.75" and the floating-point value 14.75 stored in a 'double' type.

  -- GG


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.