• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode 'as string' = unicode?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode 'as string' = unicode?


  • Subject: Re: Unicode 'as string' = unicode?
  • From: Christopher Nebel <email@hidden>
  • Date: Thu, 5 Sep 2002 01:28:55 -0700

On Wednesday, September 4, 2002, at 06:08 PM, Paul Berkowitz wrote:

I'm aware that 'string' is now 'styled text', 'international string'.

Ah, no. It helps to be aware of the distinction between AppleScript types and Apple Event types, which is admittedly a bit muddled. "string" (an AppleScript type) is, for most purposes, "styled text" (an Apple Event type) [1]. "international text" is purely an Apple Event type; it's never sent by AppleScript and is turned into "string" on receipt [2]. None of this is new; it's been like this since AppleScript 1.0.

Even so: when I use a Japanese and a Russian keyboard to input Japanese and
Russian into names of Address Book contacts (manually), then get the names
via AppleScript, the results are in Unicode. When I then

set stringName to theName as string
stringName = theName
--> true

What's going on? Even international string should not be able to do
double-byte Unicode, should it? If I open the result in a Script Debugger
result window, sure enough, it looks exactly the same (although it doesn't
look perfect). Also if I transpose it to Entourage, it looks perfect there.

Now we cover who's capable of what:

"international text": an encoding (e.g., MacRoman) followed by a bunch of data in that encoding.

"styled text": a bunch of text data (the 'ktxt' part) and an array of style info (the 'kstyl' part) that tells you how to draw each character -- font, size, and style. Fonts imply a certain encoding, so for the purposes of what's representable, you can think of it as a sequence of "international text"s. Yes, this is *more* powerful than "international text".

AppleScript is smart enough to use the encodings buried in "string" objects when comparing them to "Unicode text", so you'll generally get the correct results, as you did above.

An interesting kink in testing this sort of thing is that Script Editor can't really display Unicode. What it's doing is coercing it into styled text and then displaying that. Since most characters most people are likely to type (e.g., any Roman, Cyrillic, classical Greek, or CJK) are representable in "styled text", this works. Try some of the weirder characters, though, and it'll fall down.

Why is AppleScript not doing anything when asked to get Japanese 'as string'?

It is, but it's hard to see in Script Editor -- it turned the "Unicode text" object into a "string" object, which means it got turned into styled text. Because of the above, these display identically in Script Editor, but you could probably tell them apart in Script Debugger.

I'm actually trying to test to see whether the Unicode text is nothing but ASCII will go anywhere, or whether it's something that really has to be Unicode. So I'm testing now with the <<class ktxt>> business.

Now we get to the real issue. First off, testing for the presence of 'ktxt' won't do you much good, because Unicode-to-string always generates style information. Second, are you sure you want to know if the string is representable in ASCII? Or were you actually thinking of MacRoman or ISO-8859-n? If you want ASCII specifically, you could do it by checking the ASCII number of every character -- if all of them are less than 128, then it's representable. For any other encoding, it's harder. (I suppose you could make a string that contains all the characters you care about and then test that each character of the source string is in it.)

Third, one wonders why you care. Who's on the receiving end of this, and why do you need to do something different depending? There might be a way around the problem.


--Chris Nebel
AppleScript Engineering

[1] "string" objects are sent out in events as plain text (aka typeText) if they have no style information at all. String constants never have style information.

[2] I've never seen anyone actually use "international text". Some dictionaries claim they do, but they lie.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.
  • Follow-Ups:
    • Re: Unicode 'as string' = unicode?
      • From: John W Baxter <email@hidden>
    • Re: Unicode 'as string' = unicode?
      • From: John Delacour <email@hidden>
References: 
 >Unicode 'as string' = unicode? (From: Paul Berkowitz <email@hidden>)

  • Prev by Date: Re: Script Menu Bug
  • Next by Date: Re: (9.2.2) Path To...
  • Previous by thread: Unicode 'as string' = unicode?
  • Next by thread: Re: Unicode 'as string' = unicode?
  • Index(es):
    • Date
    • Thread