Re: Encoding of script text
Re: Encoding of script text
- Subject: Re: Encoding of script text
- From: Paul Berkowitz <email@hidden>
- Date: Thu, 12 Nov 2009 14:24:42 -0800
- Thread-topic: Encoding of script text
Scott,
You say " They do as they please, generally encoding characters in
accordance with the default 8-bit code page and only switching to Unicode
when presented with a character that has no mapping in the default 8-bit
code page."
Couldn't you, just as a "house rule", require that every script in your
collection begin with a sort of home-made BOM, namely some particular (set
set of) Unicode-only characters, commented out, like:
-- ₠
or something like that? I can't see why it wouldn't work in comments, but
just in case that wouldn't work for you, then
"₠" = "₠"
could do instead.
--
Paul Berkowitz
> From: Scott Babcock <email@hidden>
> Date: Thu, 12 Nov 2009 22:03:11 +0000
> To: AppleScript-Users <email@hidden>
> Subject: Encoding of script text
>
> I work with folks from around the planet to create and maintain dozens of
> libraries and thousand of scripts. Since many of these folks run on Mac OS X
> configurations in which MacRoman is not the default 8-bit encoding, we run
> into situations in which scripts fail to compile due to encoding mismatches.
>
> For example, the backslash is at code point x'5C' in MacRoman, but it's at
> code point x'80' in MacJapanese. The angle quotes (guillemets) also present
> difficulties.
>
> If we could explicitly specify that script text be encoded as Unicode
> (preferably UTF-8), there'd be no problem. The BOM at the head of the file
> would inform AppleScript and our harness code as to the encoding of the file
> content. There would be no ambiguity and therefore no chance for
> misinterpretation. Unfortunately, there doesn't appear to be any explicit way
> to specify character encoding in AppleScript editors. They do as they please,
> generally encoding characters in accordance with the default 8-bit code page
> and only switching to Unicode when presented with a character that has no
> mapping in the default 8-bit code page.
>
> Even converting the script text to Unicode outside of the editor does no good,
> because the compilation process will cause the text to revert to the default
> 8-bit encoding. This is probably a function of the AppleScript scripting
> component itself rather than a behavior that resides in the editor.
>
> Here are my questions:
>
> 1. Is there a reliable way (explicit or implicit) to force AppleScript editors
> and the AppleScript scripting component to use Unicode encoding?
> 2. Is there a reasonable way to determine the encoding of 8-bit AppleScript
> source files?
>
> We've considered defining our own 8-bit encoding indicators to add to the
> content of the files themselves, but adding these to thousands of files would
> be a huge task and would not address the scenario in which someone opens a
> file directly in an editor and attempts to compile it.
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> AppleScript-Users mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> Archives: http://lists.apple.com/archives/applescript-users
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden