Re: Automator bug in "run shell script" ?
Re: Automator bug in "run shell script" ?
- Subject: Re: Automator bug in "run shell script" ?
- From: Ron Hunsinger <email@hidden>
- Date: Tue, 14 Jun 2011 18:58:58 -0700
On Jun 14, 2011, at 4:31 PM, Jean-Christophe Helary wrote:
> Besides, as far as I can tell, it is not a matter of what the characters _look_ like. They look exactly the same in the Automator output too.
>
> It is a matter of being not the same.
The characters not only look the same, they _are_ the same. What's different is the sequence of Unicode codepoints used to represent the character. Characters and codepoints are _not_ the same thing.
Programs like grep aren't Unicode aware. They're just looking at the input as a sequence of unsigned 8-bit bytes. grep, strictly speaking, isn't even aware that the bytes it sees are the UTF8 encoding of a sequence of Unicode codepoints; it most certainly isn't smart enough to group codepoints together to form characters and match those, any more than it's smart enough that if you ask it to search for 5.3 it'll find 53e-1. They are the same number after all, just a different sequence of bytes. Too bad grep looks at bytes and not numbers or characters.
It's not just grep. Try using /usr/bin/sort to sort lines containing non-ASCII characters, and compare the result to the locale-specific expectation. In German, for example, the letter o with an umlaut should sort (almost) as if it were the letter o followed by the letter e. /usr/bin/sort won't put it even close.
The point is, you cannot use tools that don't even grok Unicode and expect them to handle international text correctly.
> What matters is that if I want to process a string that uses a composed form and Automator gives me a decomposed form without warning I don't get the result I expect. And there is not setting anywhere in "run shell script" that lets me pass composed strings.
If you process the string as Unicode, you'll still get the right results. If you process it as a sequence of bytes, for example by using a tool that doesn't understand Unicode, you won't. You have to use the right tools. "do shell script" does what it can, by encoding the codepoints in UTF8 before passing them to the shell, and converting the output back from UTF8 to Unicode text. The problem isn't "do shell script" itself, it's which Unix tools you ask the shell to use. _______________________________________________
Do not post admin requests to the list. They will be ignored.
Automator-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden