Re: As Text Work-Around Broken
Re: As Text Work-Around Broken
- Subject: Re: As Text Work-Around Broken
- From: has <email@hidden>
- Date: Sat, 14 Jan 2006 19:26:07 +0000
Emmanuel wrote:
>>I have used this for some time but it seems to be broken..
>>
>>For unicode text this should work. Any ideas?
>>
>> set vStringAsText to (((vString as string) as record)'s «class
>
>This has never worked fully,
Indeed. It's a hack, exploiting what's basically a bug in the AppleScript implementation. It may produce the desired result for some cases, but this is entirely accidental and it's just as happy to return garbage without any kind of indication or warning. Use entirely at own risk, etc.
>as we publish at:
><http://www.satimage-software.com/en/unicode_and_applescript.html>
One comment about that page: it incorrectly states that an AS string consists of "Mac-Roman" characters; AS strings actually use the user's primary encoding, as determined from their International system preferences. For most US, western European and Antipodean users this will be MacRoman, but will often be different for folks in other parts of the world.
>In that page there is an "idea" about how to coerce Unicode into raw text.
Not coercion; conversion. In Satimage.osax's case, using its 'extract string' command to sneakily take a styled string and return a plain string [presumably] containing only the character data from the first. I suspect - though stand to be corrected - that this produces the same end result as the above hack, in which case the result would still include garbage if the text being converted contains characters outside the user's primary encoding.
The correct solution to all this mess is to use a proper text encoding converter. AppleScript, Standard Additions, et-al don't bother to provide one, so (big surprise) you're stuck with third-parties to fill the gap.
For example, TextCommands <http://osaxen.com/files/textcommands1.0.1.html> includes a 'convert from unicode' command that will convert Unicode text to a plain string in the encoding of your choice, allowing you to specify if you want it to error on, omit or replace any characters that fall out of that range. Another option would be the TEC osax <http://osaxen.com/files/tec1.3.3.html>, though I've never used it so can't vouch for it myself.
One problem with both of the above: neither appears to provide any way to discover the user's primary encoding automatically (I was simply too lazy to add this myself; can't speak for the TEC osax author), so you'd need to hardwire that information directly into the script.
Shouldn't be a problem if the script is for your own use; problematic if you want to distribute it around the globe. Unfortunately, I don't know of any other ready-made solutions offhand, so unless/until some enterprising soul wants to patch one or the other I guess folk'll just have to make do.
Moral: dealing with text encodings sucks. Be sure to enthusiastically welcome your new Unicode overlords when they finally finish vanquishing all the rest.
has
p.s. Everybody should also read <http://www.joelonsoftware.com/articles/Unicode.html> (if they haven't already done so); great for getting an idea of all the issues involved.
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden