On Dec 31, 2007, at 2:20 AM, Andrew Thompson wrote:
On Dec 30, 2007, at 7:20 AM, Michael Hall wrote:
Why is this successful? I would think the original unicode you had
wouldn't convert in any straight forward way to UTF-8?
Why do you say that? UTF-8 can represent everything in the unicode
Basic Multilingual Plane, which includes most common Japanese. You
only need to go to UTF-16 or UTF-32 (aka UCS4) if you want to use
stuff in the higher planes, which is only typically necessary for
dead languages or relatively infrequently used Chinese/Japanese/
Korean characters.
OK, possibly the conversion is reasonably straight forward. But since
it didn't work initially and did modified
Something here has to do some sort of conversion and I still can't
figure out what is doing it. bos.write shouldn't,
path.getCanonicalPath() shouldn't do any character set conversions
and I'm pretty sure getBytes(encoding) shouldn't. The last might be
where I'm remembering wrong. The javadoc doesn't spell out if it
actually does any converting. I thought from some post some time back
that the data already had to conform to the indicated encoding.
But I guess then maybe what you indicated does explain it since you
say the characters probably do conform to UTF-8. Maybe then UTF-8
succeeds where the default, probably roman, encoding doesn't since
the characters don't actually conform to that. So as long as no
conversion is required this does make sense.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden