Re: wchar_t and printf not working

28 Mar 2005

      site_archiver@lists.apple.com
Delivered-To: darwin-dev@lists.apple.com
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=oRmIVb3bJBb2Co4JdHj0OSIoH2g7iM3huiEIv7xnKwl6mXkpZ3uBZLftY2RBd0AVEYqoMjbRHcTfC9+DwJFuRsUi19QLa7RVw9BdLKX3jmS1+nxqRAEYTOEZlH9gMzPNgYoBiHBxK+iF6FJCZgdXrLw3axC2jJpZvz9XN4oQq8A=

On Mon, 28 Mar 2005 03:51:50 -0500, Michael B Allen <mba2000@ioplex.com> wrote:
...
On Mon, 28 Mar 2005 11:36:10 +0400
Alexey Proskuryakov <ap-carbon@rambler.ru> wrote:
...
...
Each character may occupy between 1 and 6 bytes [1].
More precisely, between 1 and 4:
<http://www.unicode.org/faq/utf_bom.html#30>.
At risk of being pedantic this is just talking about how to convert a
UTF-16 character into a UTF-8 one. Because UTF-16 with a surrogate can
only represent 21 bits of the Unicode code space only 4 bytes is necessary
to encode any character in UTF-8.
Unicode *only hase* 21 bits of code space, even UTF-32 only uses 21 bits.
...
But UTF-8 can encode the full 31 bit code space which needs at most 6 bytes to be
represented in UTF-8. But unless you're doing Klingon you'll never actually see more
than 4.
Actually, you will *never* see UTF-8 with more than 4 octets per
codepoint. Period. That is the way that UTF-8 is defined. If you see a
5 or 6 octet character, then you are not reading UTF-8 data.

--
Clark S. Cox III
clarkcox3@gmail.com
http://www.livejournal.com/users/clarkcox3/
http://homepage.mac.com/clarkcox3/
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwin-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl...

This email sent to site_archiver@lists.apple.com

Re: wchar_t and printf not working

Clark Cox