• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Terminal and UTF-8
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Terminal and UTF-8


  • Subject: Re: Terminal and UTF-8
  • From: Max Horn <email@hidden>
  • Date: Tue, 26 Mar 2002 13:41:05 +0100

At 12:46 Uhr +0100 26.03.2002, Manfred Lippert wrote:
> It's quite simple, an applicaton has to support UTF8 for this to
work. Most standard unix applications simply don't do this! That is
"works" in cat is only because cat simply echos anything raw to the
terminal, and also takes your raw input.

But the question is: Why does _entering_ of data work well in "cat" and my
own simple "getchar" program and NOT in tcsh, mysql etc.?

As I tried to explain: cat just echoes back what you type. So you hit an "d" on your KB, and Terminal.apps sends two byte of input data, which cat boths echos back, and Terminal.app display stdout, interpreting it as UTF8 - so of course it display correctly, it just went to a round trip. Similiar for your getchar program I'd assume


For the input these programs aren't affected - are they? I think this should
all be handled in some "input buffer" of the Terminal before the data is
really sent to the program (if the user hits the "return" key).

WEll, if you type "d" in for example tcsh, it will also get two chars. But one of them (or sometimes both; or in the case of chars with even more bytes, multiple) will be really weird stuff, and tcsh will strip them. So you end up with only one of the two chars being sent back to stdout, or maybe even only one, so it won't look well at all. Even if it passes back correctly, once you press backspace, weird things will happen: only the last byte (of the two of which your char consists) you entered will be deleted, weird results *must* happen then.

To sum it, anything that tries to process your input data and doesn't know it's meant to be UTF8 will break as soon as you enter anything outside the ASCII (0-127) range. That's not a bug in Terminal.app, though, there is nothing it could do about it.


I don't think that mysql has to support UTF-8 for now, if I can resign
correct ordering of text data. But I am not able to enter it! That's the
main problem and if I see it right, this must be a problem of the Terminal
itself. If I import the UTF-8 data from a file into mysql, all works quite
well.

No it must not be a Terminal problem. Maybe you have a misunderstanding on how UTF8 works, though?



UTF8 encoding encodes single letters into 1 up to 6 (or was it even
more) bytes.

I know that. In the actual Unicode standard, this is reduced to 4 bytes,
IIRC.

Not quite correct. Unicode can be expressed in various encodings, like UTF8, UTF16, etc., and also e.g. UCS4. The canonical encoding is exactly 4 bytes/char. but since this would mean text size is quadrupled, usually one uses encoding like UTF8 and UTF16. At least for users from the western hemisphere, this usually means you get away with fewer bytes (they normally use 1/2 bytes / char for the "common" chars), but in cases of "rare" chars being used, they sometimes have to use more than 4 bytes to be able to express the char. So if you type in loads of chinese text, it's probably a bad idea to use UTF8, but if most of your stuff is english/german, UTF8 incurrs little to no overhead compared to classical enocdings like ISO Latin 1.


Max
--
-----------------------------------------------
Max Horn
Software Developer

email: <mailto:email@hidden>
phone: (+49) 6151-494890
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Terminal and UTF-8
      • From: Manfred Lippert <email@hidden>
References: 
 >Re: Terminal and UTF-8 (From: Manfred Lippert <email@hidden>)

  • Prev by Date: Re: INFORMATION.....
  • Next by Date: Re: Project Builder 1.1.1 won't start
  • Previous by thread: Re: Terminal and UTF-8
  • Next by thread: Re: Terminal and UTF-8
  • Index(es):
    • Date
    • Thread