Re: Truncating UTF-8 Strings

Subject: Re: Truncating UTF-8 Strings
From: Klaus Berkling <email@hidden>
Date: Mon, 8 Jun 2009 16:49:22 -0700

On Jun 8, 2009, at 2:40 PM, Andrew Lindesay wrote:

If you render the original string, I presume that it does not contain the corrupted UTF-8 sequence and renders the glyphs correctly?

Right. If I change the number of characters I get different results. Truncating to 12 bytes makes up two japanese characters, 6 makes up one.

returnValue = new String(textBlock.toString().getBytes("UTF-8"), 0, lengthTruncated, "UTF-8");

^^^ I know you tried it using sub-strings, but this above would definitely cause trouble as it could break inside multi-byte sequences.

I still get 'fractional' multi-byte characters but the results are different:

Previously:

Note the length is different so it does make an attempt to count the glyphs. This could mean that it's a different type of encoding and so my data is corrupted at at least not what I think it is.

Thanks

kib

"Success is not final, failure is not fatal: it is the courage to continue that counts."

Winston Churchill

Klaus Berkling

Systems Administrator

DynEd International, Inc.

www.dyned.com | www.eskimo.com/~kiberkli

Attachment: smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References:
	>Truncating UTF-8 Strings (From: Klaus Berkling <email@hidden>)
	>Re: Truncating UTF-8 Strings (From: Andrew Lindesay <email@hidden>)

Prev by Date: Re: [WWDC] Any supper plans for Monday?
Next by Date: Re: [WWDC] Any supper plans for Monday?
Previous by thread: Re: Truncating UTF-8 Strings
Next by thread: Re: Truncating UTF-8 Strings (Resolved)
Index(es):
- Date
- Thread