Re: printing Utf8
Re: printing Utf8
- Subject: Re: printing Utf8
- From: Ken Thomases <email@hidden>
- Date: Wed, 31 Oct 2012 15:21:11 -0500
On Oct 31, 2012, at 2:42 PM, Gerriet M. Denkmann wrote:
> When I run this in Xcode for a few times, I get sometimes good output, but sometimes not.
> Bad output looks like:
> 2012-11-01 01:56:29.971 Writing[76838:303] strlen 1027
> หหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหหห\340\270\253
>
> Did run 10 times, got the bad output 3 times.
>
> Did run 20 times in Terminal - never got bad output.
It seems it's a bug in Xcode. Whereas I earlier wrote in terms of the write buffer, it occurs to that that's probably not it. Rather, it's probably just the chance of how the kernel satisfies Xcode's read of its pseudo-terminal device and then how Xcode handles reads of incomplete UTF-8 sequences.
That is, there's no particular guarantee that Xcode reads data in the same size chunks as your program (or stdio) is writing it. The size of the chunks that Xcode reads is somewhat arbitrary, although presumably there's an upper limit based on the buffer size that Xcode is passing to the kernel as well as the kernel's internal buffer for the pty device.
Apparently, Xcode treats each buffer as though it were a standalone UTF-8-encoded string. If the read happens to split a multi-byte UTF-8 sequence, then the partial sequence is determined to be invalid UTF-8 and is output as octal escapes. Then, the next buffer looks like it starts with an invalid sequence, too. What it should do is keep the tail end of the buffer, the part which looked like an invalid sequence (up to 3 bytes), and prepend that to the next buffer it reads before trying to interpret that as UTF-8.
Anyway, it is not a bug in your code nor Cocoa's UTF-8 encoding.
Regards,
Ken
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden