Re: Unicode 'as string' = unicode?
Re: Unicode 'as string' = unicode?
- Subject: Re: Unicode 'as string' = unicode?
- From: Christopher Nebel <email@hidden>
- Date: Thu, 5 Sep 2002 01:28:55 -0700
On Wednesday, September 4, 2002, at 06:08 PM, Paul Berkowitz wrote:
I'm aware that 'string' is now 'styled text', 'international string'.
Ah, no. It helps to be aware of the distinction between AppleScript
types and Apple Event types, which is admittedly a bit muddled.
"string" (an AppleScript type) is, for most purposes, "styled text" (an
Apple Event type) [1]. "international text" is purely an Apple Event
type; it's never sent by AppleScript and is turned into "string" on
receipt [2]. None of this is new; it's been like this since
AppleScript 1.0.
Even so: when I use a Japanese and a Russian keyboard to input
Japanese and
Russian into names of Address Book contacts (manually), then get the
names
via AppleScript, the results are in Unicode. When I then
set stringName to theName as string
stringName = theName
--> true
What's going on? Even international string should not be able to do
double-byte Unicode, should it? If I open the result in a Script
Debugger
result window, sure enough, it looks exactly the same (although it
doesn't
look perfect). Also if I transpose it to Entourage, it looks perfect
there.
Now we cover who's capable of what:
"international text": an encoding (e.g., MacRoman) followed by a bunch
of data in that encoding.
"styled text": a bunch of text data (the 'ktxt' part) and an array of
style info (the 'kstyl' part) that tells you how to draw each character
-- font, size, and style. Fonts imply a certain encoding, so for the
purposes of what's representable, you can think of it as a sequence of
"international text"s. Yes, this is *more* powerful than
"international text".
AppleScript is smart enough to use the encodings buried in "string"
objects when comparing them to "Unicode text", so you'll generally get
the correct results, as you did above.
An interesting kink in testing this sort of thing is that Script Editor
can't really display Unicode. What it's doing is coercing it into
styled text and then displaying that. Since most characters most
people are likely to type (e.g., any Roman, Cyrillic, classical Greek,
or CJK) are representable in "styled text", this works. Try some of
the weirder characters, though, and it'll fall down.
Why is AppleScript not doing anything when asked to get Japanese 'as
string'?
It is, but it's hard to see in Script Editor -- it turned the "Unicode
text" object into a "string" object, which means it got turned into
styled text. Because of the above, these display identically in Script
Editor, but you could probably tell them apart in Script Debugger.
I'm actually trying to test to see whether the Unicode text is nothing
but ASCII will go anywhere, or whether it's something that really has
to be Unicode. So I'm testing now with the <<class ktxt>> business.
Now we get to the real issue. First off, testing for the presence of
'ktxt' won't do you much good, because Unicode-to-string always
generates style information. Second, are you sure you want to know if
the string is representable in ASCII? Or were you actually thinking of
MacRoman or ISO-8859-n? If you want ASCII specifically, you could do
it by checking the ASCII number of every character -- if all of them
are less than 128, then it's representable. For any other encoding,
it's harder. (I suppose you could make a string that contains all the
characters you care about and then test that each character of the
source string is in it.)
Third, one wonders why you care. Who's on the receiving end of this,
and why do you need to do something different depending? There might
be a way around the problem.
--Chris Nebel
AppleScript Engineering
[1] "string" objects are sent out in events as plain text (aka
typeText) if they have no style information at all. String constants
never have style information.
[2] I've never seen anyone actually use "international text". Some
dictionaries claim they do, but they lie.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.