Re: Posix path and High Ascii Characters
Re: Posix path and High Ascii Characters
- Subject: Re: Posix path and High Ascii Characters
- From: Christopher Nebel <email@hidden>
- Date: Mon, 9 Sep 2002 11:57:18 -0700
On Monday, September 9, 2002, at 09:42 AM, John Delacour wrote:
At 2:26 pm +0200 9/9/02, Alain Content wrote:
Posix path does not change high ascii characters (ie, i h g etc...)
which
sometimes make unix paths unapplicable for shell scripts or terminal.
Provided you have named the file with a Mac keyboard layout rather
than a Unicode keyboard layout -- as you almost certainly have -- then
you should have no problem using the original Mac characters in shell
scripts, BUT you must either escape the space in the pathname or
enclose the pathname in DOUBLE quotes
Nope. Won't make a difference unless you created the file using the
Terminal. In fact, such a filename is technically illegal (see below),
but most places in the system can cope with it. Also, DOUBLE quotes
allow interpretation of some shell meta-characters, notably $. To get
everything literally, use SINGLE quotes, or use the "quoted form"
property, which does it for you. (Of course, "quoted form" only works
right in 1.8.3 and later.)
or simply don't be a Frenchman :-)
This is what I like to call the "doctor" solution, as in the old joke
"Doctor, it hurts when I do this." "Well, then, don't do that." It's
a workable, if irritating, solution for small problems.
It looks to me as if OS X is still very confused about encodings and
no matter how you change the window settings in Terminal, it's hard to
see what's meant to be happening.
Perhaps Chris can clarify this whole issue.
I'll try. Part of the difficulty is that some BSD tools (e.g., ls) try
to be "nice" about non-printable-ASCII characters by turning them into
something else, e.g. "?". Bear in mind that most of these tools are
open source and are not directly under Apple's control, so it's not
entirely Mac OS X's problem. For best results, set the Terminal
encoding to UTF-8 and use commands that do as little interpretation as
possible, e.g., "ls -v" or "echo". Terminal in Jaguar is very good at
displaying UTF-8.
Strictly speaking, file names are fully decomposed Unicode. (No jokes,
please; that's the technical term. It means that accented characters
are stored as the base character plus combining accent mark
"characters", not pre-composed characters, so "e-acute" is stored as
0065 (e) + 0301 (combining acute accent), not 00E9 (e with acute).)
It's possible to dodge this using sufficiently low-level tools like
BSD, but it's a bad idea, because it means you can get two files whose
names differ only by composition -- it's very difficult to tell them
apart.
How a file name looks at the API level depends on the API. Current
Carbon APIs handle file names as an array of UTF-16 characters; POSIX
ones handle them as an array of UTF-8, which is why UTF-8 works well in
Terminal. How it's stored on disk depends on the disk format; HFS+
uses UTF-16, but that's not important in most cases.
Did that make any sense, or is everyone thoroughly confused now?
Your "/Users/ac/Desktop/photos\ e\314\201te\314\201/" makes no sense
to me in UTF-8.
e'te' should be encoded in UTF-8 as ascii characters
{#195, #169, #116, #195, #169}
His file name is encoded using decomposed characters, as above. Your
solution uses composed characters, which as I said is not a good idea
-- the proper UTF-8 sequence for an e-acute is {101, 204, 129}.
--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.