Mailing Lists: Apple Mailing Lists
Image of Mac OS face in stamp
Re: Posix path and High Ascii Characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Posix path and High Ascii Characters

On Monday, September 9, 2002, at 09:42 AM, John Delacour wrote:

At 2:26 pm +0200 9/9/02, Alain Content wrote:

Posix path does not change high ascii characters (ie, i h g etc...) which
sometimes make unix paths unapplicable for shell scripts or terminal.

Provided you have named the file with a Mac keyboard layout rather than a Unicode keyboard layout -- as you almost certainly have -- then you should have no problem using the original Mac characters in shell scripts, BUT you must either escape the space in the pathname or enclose the pathname in DOUBLE quotes

Nope. Won't make a difference unless you created the file using the Terminal. In fact, such a filename is technically illegal (see below), but most places in the system can cope with it. Also, DOUBLE quotes allow interpretation of some shell meta-characters, notably $. To get everything literally, use SINGLE quotes, or use the "quoted form" property, which does it for you. (Of course, "quoted form" only works right in 1.8.3 and later.)

or simply don't be a Frenchman :-)

This is what I like to call the "doctor" solution, as in the old joke "Doctor, it hurts when I do this." "Well, then, don't do that." It's a workable, if irritating, solution for small problems.

It looks to me as if OS X is still very confused about encodings and no matter how you change the window settings in Terminal, it's hard to see what's meant to be happening.

Perhaps Chris can clarify this whole issue.

I'll try. Part of the difficulty is that some BSD tools (e.g., ls) try to be "nice" about non-printable-ASCII characters by turning them into something else, e.g. "?". Bear in mind that most of these tools are open source and are not directly under Apple's control, so it's not entirely Mac OS X's problem. For best results, set the Terminal encoding to UTF-8 and use commands that do as little interpretation as possible, e.g., "ls -v" or "echo". Terminal in Jaguar is very good at displaying UTF-8.

Strictly speaking, file names are fully decomposed Unicode. (No jokes, please; that's the technical term. It means that accented characters are stored as the base character plus combining accent mark "characters", not pre-composed characters, so "e-acute" is stored as 0065 (e) + 0301 (combining acute accent), not 00E9 (e with acute).) It's possible to dodge this using sufficiently low-level tools like BSD, but it's a bad idea, because it means you can get two files whose names differ only by composition -- it's very difficult to tell them apart.

How a file name looks at the API level depends on the API. Current Carbon APIs handle file names as an array of UTF-16 characters; POSIX ones handle them as an array of UTF-8, which is why UTF-8 works well in Terminal. How it's stored on disk depends on the disk format; HFS+ uses UTF-16, but that's not important in most cases.

Did that make any sense, or is everyone thoroughly confused now?

Your "/Users/ac/Desktop/photos\ e\314\201te\314\201/" makes no sense to me in UTF-8.

e'te' should be encoded in UTF-8 as ascii characters
{#195, #169, #116, #195, #169}

His file name is encoded using decomposed characters, as above. Your solution uses composed characters, which as I said is not a good idea -- the proper UTF-8 sequence for an e-acute is {101, 204, 129}.

--Chris Nebel
AppleScript Engineering
applescript-users mailing list | email@hidden
Do not post admin requests to the list. They will be ignored.

 >Re: Posix path and High Ascii Characters (From: John Delacour <email@hidden>)

Visit the Apple Store online or at retail locations.

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2011 Apple Inc. All rights reserved.