• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: changing international text to unicode text
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: changing international text to unicode text


  • Subject: Re: changing international text to unicode text
  • From: Christopher Nebel <email@hidden>
  • Date: Sun, 19 Dec 2004 21:42:05 -0800


On Dec 19, 2004, at 12:29 PM, Joseph Weaks wrote:

On Dec 19, 2004, at 9:33 AM, Emmanuel wrote:

...we've tried to summarize the main issues we are aware of at:
<http://www.satimage-software.com/en/unicode_and_applescript.com>

or even better: http://www.satimage.fr/software/en/unicode_and_applescript.html

It's also slightly wrong in places. Specifically:

The string class basically stores one byte ([0..255]) per character. The 128 first values are rendered according to the ASCII standard, for instance ASCII character of 37 is the percent sign %. The 128 larger values are rendered using the macintosh encoding, for instance ASCII character of 150 is ñ. We refer to this encoding as the Mac-encoding.

Actually, it uses the "primary" Script Manager encoding, that is, the one that goes with the first language listed in your International preference pane. For most US and Western European users, this will be MacRoman, in which case the rest is correct. However, other locales use other encodings: Japanese, for example, would use MacJapanese (a slightly enhanced Shift-JIS), which uses a mix of one- and two-byte characters and is not isomorphic to either MacRoman or ASCII. (0x5F in MacJapanese is a yen sign, not a backslash.)


There are cases where a string may store a more complex entity, we do not address them here.

I assume you're talking about styled text here, which in fact you do talk about later...


The Unicode text class stores two bytes or more per character, using the UTF-16 encoding.

This happens to be true, but isn't really relevant -- it's an implementation detail. All you really know is that a single "character" of a Unicode text object is one Unicode code point. (Which might be more than one UTF-16 word.) It's true that for Apple Event Manager purposes (and therefore "write"), "Unicode text" does imply UTF-16.


Be aware that the [Unicode text] file has to begin with ASCII 254, ASCII 255.

This is not strictly true, but it will help downstream consumers. Without the leading BOM, they won't be able to automatically tell that the file is UTF-16; you'll have to tell them manually. (If the consumer relies on the BOM, then it's effectively required.)


Since there is no tag which would specify whether a given file is ASCII or UTF-8 ...

Actually there is -- in fact, you mention it above (hex EF BB BF) -- but most providers don't use it.


[I]n some circumstances where an application is really expecting a regular string, you may get an AppleScript error if you pass such a quantity.

Not a correction, but just so you know, any application that specifically requires a descriptor of typeText has a bug.



--Chris Nebel AppleScript Engineering _______________________________________________ Do not post admin requests to the list. They will be ignored. Applescript-users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
  • Follow-Ups:
    • Re: changing international text to unicode text
      • From: Emmanuel <email@hidden>
References: 
 >changing international text to unicode text (From: Kevin Meaney <email@hidden>)
 >Re: changing international text to unicode text (From: Emmanuel <email@hidden>)
 >Re: changing international text to unicode text (From: Joseph Weaks <email@hidden>)

  • Prev by Date: Re: [OT] Re: Smile and FruitMenu
  • Next by Date: Re: File name list order
  • Previous by thread: Re: changing international text to unicode text
  • Next by thread: Re: changing international text to unicode text
  • Index(es):
    • Date
    • Thread