Hello, I have found a strange behavior in AppleScript, when using the Unix-command sed and "«" or "»" (in french, we use "«" and "»" as guillemots).
Before, a little explanation about 'sed'. This is an Unix command for regex (regular _expression_) replacement. The synopsis I use is : sed -E 's/regex/replacement/g' (the -E option is for use modern Regex, and the '/g' is for global replacement, not just for the first match.)
For this example, I with replace any occurrence of ' »' with '_»' (space+'»' is replaced by underscore+'»'). I know this can simply made by use of 'text item delimiters'. But here, for more complex replacement, I will use the 'sed' command.
In Terminal : echo "Bonjour « hello » world" | sed -E 's/ »/_»/g' give : Bonjour « hello_» world as expected.
In AppleScript, the same : set r to do shell script "echo \"Bonjour « hello » world\" | sed -E 's/ »/_»/g'" give : Bonjour « hello_» world as expected.
But, in my code, I would replace not only ' »' but also ' :', ' ;' for example. This can be made by the regex : ' [»:;]' ([»:;] signify '»' or ':' or ';'). For simplify this example, I use only [»].
Then in Terminal : echo "Bonjour « hello » world" | sed -E 's/ [»]/_»/g' give : Bonjour « hello_» world as expected.
In AppleScript, the same : set r to do shell script "echo \"Bonjour « hello » world\" | sed -E 's/ [»]/_»/g'" give : Bonjour_¬ª´ hello_¬ªª world as NOT expected (encoding garbage).
Can anybody explain what happened and how workaround this ? I know I can workaround this by two 'sed' command, like : sed -E 's/ »/_»/g' (for the '»') and then: sed -E 's/ [:;]/_»/g' for the others.
But can I use [»:;] in one tell without the encoding garbage?
Thanks if you can explain what happens and how disable the garbage.
|