Re: Posix path and High Ascii Characters
Re: Posix path and High Ascii Characters
- Subject: Re: Posix path and High Ascii Characters
- From: Christopher Nebel <email@hidden>
- Date: Mon, 9 Sep 2002 22:03:47 -0700
On Monday, September 9, 2002, at 04:46 PM, alain content wrote:
1. you wrote
Actually, the problem is with "do shell script", not "POSIX path". It
doesn't understand the proper text encoding to use with shell
commands.
The simplest solution is to upgrade to 10.2 Jaguar, which fixes the
bug. Failing that, a technique has been described here a few times to
mangle the path into UTF-8 by hand.
Under 10.2, do shell script still fails when the path contains high
asciis,
e.g.
do shell script "cd '~/Desktop/photos e'te''" or
do shell script quoted form of "cd ~/Desktop/photos e'te'" or
Both return an error (sh: cd ~/Desktop/photos e'te' : no such file or
directory)
The second has to do with misquoting -- it's complaining that there's
no such file as "cd ~/Desktop/photos ete". The "quoted form" needs to
go around just the file, not the whole command.
The first has to do with the composed/decomposed ambiguity I mentioned
earlier. For slightly complicated reasons, you're passing a composed
e-acute to the shell, but there isn't any file with that name: its name
is e+acute mark. This is going to be a problem with just about any
hardcoded path like this.
However, the following seems to work :
set thePath to POSIX path of "U:Users:ac:Desktop:photos e'te'"
do shell script "cd " & quoted form of thePath & ";pwd"
-- "/Users/ac/Desktop/photos e'te'
Because of the path through the APIs, this irons out the ambiguity
properly.
2. None of these work when talking with Terminal, unfortunately :
set thePath to POSIX path of "U:Users:ac:Desktop:photos e'te'"
tell application "Terminal"
activate
do script "cd " & quoted form of thePath in window frontmost
end tell
Or
tell ...
do script ("echo " & quoted form of thePath) in window frontmost
Works fine for me, but it requires that the Terminal be set to UTF-8.
Did you change yours to MacRoman?
4. You wrote:
-- the proper UTF-8 sequence for an e-acute is {101, 204, 129}.
Now, this is just plain curiosity, but what's the relation between
that and what I'm seeing -- e\314\201 -- (except that perhaps 101 is
0065, hence the "e" ?)
Right -- I was using decimal to match JD's code.
Why is it that the very same folder seems to receive two different
names/encodings in Terminal?
Let me explain:
When using the file completion props of tcsh to change dir, I get to
~/Desktop/photos e\314\201te\314\201
However, using the trick mentioned above,
~/Desktop/photos \303\251t\303\251
Again, this the composed vs. decomposed character difference. The
former is decomposed, the latter is composed. The fact that both exist
makes life decidedly more difficult, but that's how it is.
As I understand it, the original Unicode design had only base
characters and combining marks. This was a superior design in several
ways -- it's more flexible, easily allows for multiple accents on a
single base (not relevant in French, but critical for some languages),
and helps keep the total number of code points down. Unfortunately, it
was also not trivially compatible with the already existing ISO-8859
encodings -- a real issue if you've got lots of existing data. That
didn't sit well with various Unicode consortium members, so they added
pre-composed compatibility characters, producing the system we have now.
--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.