Re: Truncating UTF-8 Strings
Re: Truncating UTF-8 Strings
- Subject: Re: Truncating UTF-8 Strings
- From: Klaus Berkling <email@hidden>
- Date: Mon, 8 Jun 2009 16:49:22 -0700
On Jun 8, 2009, at 2:40 PM, Andrew Lindesay wrote: If you render the original string, I presume that it does not contain the corrupted UTF-8 sequence and renders the glyphs correctly?
Right. If I change the number of characters I get different results. Truncating to 12 bytes makes up two japanese characters, 6 makes up one.
returnValue = new String(textBlock.toString().getBytes("UTF-8"), 0, lengthTruncated, "UTF-8");
^^^ I know you tried it using sub-strings, but this above would definitely cause trouble as it could break inside multi-byte sequences.
I still get 'fractional' multi-byte characters but the results are different:
Previously:
Note the length is different so it does make an attempt to count the glyphs. This could mean that it's a different type of encoding and so my data is corrupted at at least not what I think it is.
Thanks
kib
"Success is not final, failure is not fatal: it is the courage to continue that counts." Winston Churchill
Klaus Berkling Systems Administrator DynEd International, Inc.
|
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden