Re: Terminal and UTF-8
Re: Terminal and UTF-8
- Subject: Re: Terminal and UTF-8
- From: Max Horn <email@hidden>
- Date: Tue, 26 Mar 2002 13:41:05 +0100
At 12:46 Uhr +0100 26.03.2002, Manfred Lippert wrote:
> It's quite simple, an applicaton has to support UTF8 for this to
work. Most standard unix applications simply don't do this! That is
"works" in cat is only because cat simply echos anything raw to the
terminal, and also takes your raw input.
But the question is: Why does _entering_ of data work well in "cat" and my
own simple "getchar" program and NOT in tcsh, mysql etc.?
As I tried to explain: cat just echoes back what you type. So you hit
an "d" on your KB, and Terminal.apps sends two byte of input data,
which cat boths echos back, and Terminal.app display stdout,
interpreting it as UTF8 - so of course it display correctly, it just
went to a round trip. Similiar for your getchar program I'd assume
For the input these programs aren't affected - are they? I think this should
all be handled in some "input buffer" of the Terminal before the data is
really sent to the program (if the user hits the "return" key).
WEll, if you type "d" in for example tcsh, it will also get two
chars. But one of them (or sometimes both; or in the case of chars
with even more bytes, multiple) will be really weird stuff, and tcsh
will strip them. So you end up with only one of the two chars being
sent back to stdout, or maybe even only one, so it won't look well at
all. Even if it passes back correctly, once you press backspace,
weird things will happen: only the last byte (of the two of which
your char consists) you entered will be deleted, weird results *must*
happen then.
To sum it, anything that tries to process your input data and doesn't
know it's meant to be UTF8 will break as soon as you enter anything
outside the ASCII (0-127) range. That's not a bug in Terminal.app,
though, there is nothing it could do about it.
I don't think that mysql has to support UTF-8 for now, if I can resign
correct ordering of text data. But I am not able to enter it! That's the
main problem and if I see it right, this must be a problem of the Terminal
itself. If I import the UTF-8 data from a file into mysql, all works quite
well.
No it must not be a Terminal problem. Maybe you have a
misunderstanding on how UTF8 works, though?
UTF8 encoding encodes single letters into 1 up to 6 (or was it even
more) bytes.
I know that. In the actual Unicode standard, this is reduced to 4 bytes,
IIRC.
Not quite correct. Unicode can be expressed in various encodings,
like UTF8, UTF16, etc., and also e.g. UCS4. The canonical encoding is
exactly 4 bytes/char. but since this would mean text size is
quadrupled, usually one uses encoding like UTF8 and UTF16. At least
for users from the western hemisphere, this usually means you get
away with fewer bytes (they normally use 1/2 bytes / char for the
"common" chars), but in cases of "rare" chars being used, they
sometimes have to use more than 4 bytes to be able to express the
char. So if you type in loads of chinese text, it's probably a bad
idea to use UTF8, but if most of your stuff is english/german, UTF8
incurrs little to no overhead compared to classical enocdings like
ISO Latin 1.
Max
--
-----------------------------------------------
Max Horn
Software Developer
email: <
mailto:email@hidden>
phone: (+49) 6151-494890
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.