Re: non-ASCII characters in directory names
Re: non-ASCII characters in directory names
- Subject: Re: non-ASCII characters in directory names
- From: Harald Hanche-Olsen <email@hidden>
- Date: Wed, 09 Jan 2008 20:26:41 +0100 (CET)
+ Peter Hilgers <email@hidden>:
> I am rather new to all of this, so I hope my question is not too stupid.
No, it's not at all stupid. But as they say, here be dragons.
> I have a problem with directory or file names which consist not only
> of ASCII-characters (German umlaute ä ö ü for example). In xterm the
> characters are displayed wrong and auto-completion with the tab-key
> produces some backslashes with numbers instead (\314\210 instead of
> ä), but still I can access them.
Part of the problem lies with the character encoding UTF-8 used on the
Mac these days, or not so much UTF-8 as the way Unicode deals with
characters. There are two ways to write a character like ü in
Unicode: Either as a single character known in Unicode as U+00FC LATIN
SMALL LETTER U WITH DIAERESIS or as the character u followed by the
character known in Unicode as U+0308 COMBINING DIAERESIS. To make a
long story short, if you type a ü on your keyboard the xterm receives
the UTF-8 encoding of the former, but Apple has decided to use the
latter form in filenames. So you get this (run in Terminal – notice
that my shell prompt is a semicolon):
; echo fünf | od -t ax1
0000000 f ? ? n f nl
66 c3 bc 6e 66 0a
; touch fünf
; echo f*nf | od -t ax1
0000000 f u ? 88 n f nl
66 75 cc 88 6e 66 0a
Notice that UTF-8 encodes non-ASCII characters in two octets (aka
bytes) in this case, with the single ü is encoded as c3 bc, and the
combining diaeresis as cc 88.
Unfortunately, xterm does not seem able to deal with combining
characters, so after doing the above experiment I get this in an
xterm:
; echo fünf
fünf
; echo f*nf
fü nf
while it works fine in Terminal.
Completion seems problematic because GNU readline, which is
responsible for completion in bash and many other shells, is unaware
(it seems) that the two forms of ü are the same. So if you type fü
and follow up with a tab, readline doesn't realize that this might
complete to the filename fünf, because the latter is in the long form
and the fü you have typed is in the short form.
Matlab, which requires X11 to run, is totally unable to work with any
of these files or directories, however. It simply says that the
directory does not exist.
That is probably just a matlab bug.
> Is there any solution to this?
I'm afraid my best answer is to give the world a few more years to
come to grips with Unicode and all its complexities, and use ASCII
filenames while you wait.
- Harald
_______________________________________________
Do not post admin requests to the list. They will be ignored.
X11-users mailing list (email@hidden)
This email sent to email@hidden