Re: urlEncode (AppleScript fails in Automator - Permissions error)
Re: urlEncode (AppleScript fails in Automator - Permissions error)
- Subject: Re: urlEncode (AppleScript fails in Automator - Permissions error)
- From: Christopher Nebel <email@hidden>
- Date: Thu, 01 Aug 2013 12:05:01 -0700
On Jul 9, 2013, at 10:49 PM, Kaydell Leavitt <email@hidden> wrote:
> I went looking for a urlEncode script that I could call from AppleScript and found this handler called urlEncode()
> I believe that it may work OK, but I would like to understand what's going on better than I do. ...
>
> I understand that ASCII has been deprecated from AppleScript and that nowadays that everything is Unicode text = text = string, but I believe that what's different is UTF-8 which is what I want.
>
> Each of the following expressions returns 233:
>
> id of ("é" as string)
> id of ("é" as text)
> id of ("é" as Unicode text)
> id of ("é" as «class utf8»)
>
> I read that nowadays instead of calling ASCII number that we are supposed to use "id of" instead.
>
> I would like to develop my own urlEncode() handler in pure AppleScript so that I can understand how. I've googled and found some that don't really work for UTF-8 because they assume that all characters are 8-bits wide.
All your guesses so far are correct; it sounds like you just need some confirmation and a few missing pieces. I wouldn't say that ASCII has been deprecated from AppleScript, but it's true that, as you said, Unicode text = text = string, or as I put it, text in AppleScript is all Unicode all the time. Related to that, the "ASCII character" and "ASCII number" commands have been genuinely deprecated -- they don't produce reliable results for non-ASCII characters -- and you should be using "id of" instead.
All those "id of" expressions return 233, because that's the decimal Unicode code point for a pre-composed "é" character. The trick is that a particular code point may be represented as bytes in a number of different ways depending on the encoding used. For URLs, the standard is to encode the string as UTF-8, and then escape any non-allowed bytes. Any non-ASCII character in UTF-8 will take at least two bytes -- "é" is C3 A9 -- so the resulting URL fragment would be "é". (Tip: the Character Viewer can tell you the UTF-8 encoding for any character. You can turn it on in Keyboard Preferences.)
As for doing this in "pure" AppleScript, you'll have to write code to do UTF-8 encoding yourself, which will involve some bit-wise math that AppleScript is unfortunately not well suited for. I use "pure" advisedly, because there's no particular reason to necessarily *not* shell out to perl, since both "do shell script" and perl(1) are guaranteed to be present -- unless, that is, you have a demonstrated performance problem with doing so, or if it's a matter of principle.
--Chris Nebel
AppleScript Engineering
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden