Re: Do shell script and special characters
Re: Do shell script and special characters
- Subject: Re: Do shell script and special characters
- From: Chris Espinosa <email@hidden>
- Date: Tue, 22 Oct 2002 11:36:37 -0700
On Monday, October 21, 2002, at 03:54 PM, Simon Kornblith wrote (and
cde snipped for brevity)
On 10/21/02 3:02 PM, "Chris Espinosa" <email@hidden> typed away on
his (or
her or its) keyboard producing the following output:
- Don't use shell commands to produce Mac-encoded text for return to
AppleScript. Use AppleScript's file read/write OSAXen, which are
designed to read Mac text files and return Mac formatted strings to
AppleScript.
What about when actual shell commands return ASCII strings? In the
case of
my AppleScript application, this means that secure data needs to be
written
to the disk. I know that I can delete it with "rm -P" and completely
wipe
it out, but this is still quite annoying.
Well, if they're ASCII strings, everything's fine, because ASCII is a
proper subset of UTF-8. But I thought the question was about MacRoman,
which is different from ASCII and incompatible with UTF-8, and
therefore can't be promoted to UTF-16.
- If you want to capture the output of a shell command that is not
guaranteed to produce UTF-8, postprocess it by piping it through vis
-o
to get an escaped UTF-8 form. You can unescape it with unvis,
Unfortunately there's no shell commands for translating MacRoman to
UTF-8 or back.
MacRoman to UTF8:
native2ascii -encoding MacRoman | native2ascii -reverse -encoding UTF8
UTF8 to MacRoman:
native2ascii -encoding UTF8 | native2ascii -reverse -encoding MacRoman
Unfortunately, on my machine this takes almost a second per conversion,
probably because native2ascii appears to be a Java app.
As you've discovered these are heavyweight applications, mainly used
for Web page processing and Java; they're part of the Java
distribution, not part of the Darwin/BSD underpinnings of Mac OS X.
Mac OS X does have some lighter-weight routines for converting UTF-8 to
MacRoman and vice versa (in Carbon's Text Encoding Converter and Core
Foundation's CFString) but there's no Apple-supplied UNIX tool that
invokes those.
- Assume that an error in coercing shell output to styled text means
that it's supposed to be in current text encoding? (This will produce
unpredictable results on non-MacRoman files; sometimes you'll get one
form, sometimes another, depending on what data is in the file)
In addition, this will produce unpredictable results on MacRoman files
since sometimes they may appear to be UTF8. This is what happened to
quite
a few of my scripts when the AppleScript team decided to switch the
default
behavior. Why not just let the user coerce with "as <<class utf8>>" (I
think) from plain ASCII? Better yet, add an interface to Text Encoding
Converter and let the script use that _after_ do shell script returns.
We changed the behavior because the old behavior would frequently give
wrong results: interpreting the UTF-8 output of the Shell as MacRoman
would render useless all file and path names returned by do shell
script if they contained non-ASCII characters (such as accented
characters, special characters, and all Japanese, Chinese, etc.
characters). Interpreting shell output as UTF-8 by default is correct
for most applications. And again, AppleScript never uses "plain ASCII"
-- an AppleScript string is assumed to be in the system encoding, which
may be in MacRoman or may be in MacJapanese but is never ASCII.
The problem with returning the do shell script result "raw" is that
there's no form that is both correct and useful. Raw data might be
correct but not immediately useful. MacRoman is almost always
incorrect -- and if we misinterpreted UTF-8 for MacRoman and then tried
to convert it to UTF-8 you'd get very strange, very wrong results.
UTF-8 is most correct most of the time, except when it isn't.
- Allow an 'as' or 'encoding' parameter on do shell script so you can
specify what encoding, if any, you expect the result to be in if it's
not UTF-8 (this would include 'as data' so you could, for instance,
cat
a .jpeg file and get the raw .jpeg in an AppleScript data object)
This is probably the best solution. However, it still doesn't let a
user
specify whether the input will be encoded UTF8 or not. This wouldn't
matter
much, but I have several scripts that pass stdin through do shell
script to
avoid writing to the disk and there's no way to pass stdin with the
current
version of do shell script.
Getting non-UTF-8 text into the Shell is tougher, because the Shell
tends to interpret incoming characters with things like flow-control,
etc. I agree that for the kind of thing you want to do you'd prefer to
have a UNIX pipe object in AppleScript, which you could pass standard
in to and get standard our and standard error from, and define the pipe
to be a command sequence that's executed repeatedly (or constantly) on
the input stream. That would be more versatile than do shell script,
and I'll put it on our To Do list.
Chris
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.