Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
- Subject: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
- From: Chris Hanson <email@hidden>
- Date: Wed, 4 Sep 2002 12:21:34 -0500
At 3:03 PM +0200 9/4/02, Allan Odgaard wrote:
As I said in the original letter, these codes may appear from the
multi-byte coded characters. I.e. some >7 bit character is encoded
as 2-3 characters, now is there any guarantee that byte 2 or 3 of
this sequence won't appear (to the only 8 bit aware program) as a
control code (as defined in my previous letter)?
Yes. In UTF-8, byte 2 and 3 (and 4 and 5 and 6, if used) of a
multi-byte sequence will ALSO have their high bit set.
So you will *never* see a % or a \ or a " or a ' in the middle of a
UTF-8 multi-byte sequence.
-- Chris
--
Chris Hanson | Email: email@hidden
bDistributed.com, Inc. | Phone: +1-847-372-3955
Making Business Distributed | Fax: +1-847-589-3738
http://bdistributed.com/ | Personal Email: email@hidden
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.