Re: Unicode Character in File Name
Re: Unicode Character in File Name
- Subject: Re: Unicode Character in File Name
- From: Christopher Nebel <email@hidden>
- Date: Tue, 7 Apr 2009 15:16:32 -0700
On Apr 7, 2009, at 8:29 AM, Doug McNutt wrote:
Can I depend on the Finder always displaying this character
correctly in any language?
If it displays correctly at all, then it should be fine no matter
what the user's locale is -- the whole point of Unicode is that
it's a single character encoding that everyone world-wide uses,
so there aren't any questions about what a particular byte means.
There are still questions about support of arbitrary unicode points
by fonts that are installed in the user's computer. That's the
reason for the "If it displays at all" caveat above.
I have a couple of questions that have not yet been answered in this
thread.
Unicode has the concept of a "code point" or, to avoid confusion,
codepoint. A codepoint was initially an unsigned 16 bit value but
it has been extended to support up to 32 bits even though there are
very few graphemes that use the extension at this time. In any case
the concept of codepoint is well defined by those who claim to be
the standards committee for unicode.
Actually, Unicode only defines 20 bits, up to U+10FFFF, but otherwise
correct. (You may be thinking of UTF-32, which is a character
encoding scheme that uses one 32-bit word per code point. Very easy
to work with, but wastes at least 12 bits per code point, which is why
you don't often see it.)
1) What is the official AppleScript term that describes a code
point? The word "file" is used in AppleScript to declare the
following lexical item to be an alias. What is the similar lexical
item that declares something to be a codepoint? Is it "id",
"character id", or something else?
There isn't one, exactly, but "character" comes close. "character id
x" gets you the Unicode character with code point x. The trick is
that because "character" elements of text are really combining
character sequences, the reverse operation, "id of character y", is
not a single code point for all values of y -- you may get a list of
code points.
2) What is the same term for use while talking to application
Finder? It appears that in some cases "character id" as sent to
Finder is a command rather than a variable type. Lexical item "id"
sounds an awful lot like a window id as used in Terminal.app.
The difficulty people were having here was that because of how object
specifiers in commands are sent, Finder was getting sent the specifier
"character id x", and didn't know what to do with it. Moving that bit
outside the "tell" block meant that AppleScript, which did know what
to do with it, got it. Finder probably isn't going to acquire this
knowledge any time soon, so just move the statement or use "tell me".
3) "set mypoint to codepoint 1234" or "set mypoint to 1234 as
codepoint" would make sense to an expert in unicode. Exactly what
command lines would do that in AppleScript? Various posters have
suggested a bunch of things but I still don't understand which one
is politically correct.
"character id x" is what you want. Because of that business about
combining character sequences, we considered adding a "code point"
element to text, but decided that only hardcore Unicode geeks would
even understand what it was, let alone care.
4) Does AppleScript support 32 bit codepoints? Finder?
For some definition of "32" (see above) and "support", yes. Anyone
who supports Unicode at all probably handles the entire range, if for
no other reason than they just use CF/NSString because it's easy. For
example, try this:
character id 119070 -- U+1D11E
You'll get a G-clef mark. Now, that just means that the character
will get passed around accurately, not that it'll necessarily draw as
anything other than a "missing character" glyph. As you pointed out
above, in order to draw the code point, you need a font with that
glyph, and that's hard to come by for some ranges. There is no single
font that covers the entire Unicode range, and I believe there are
even some sections of Unicode for which there are no commercial fonts
available. Presumably they'll catch up eventually.
--Chris Nebel
AppleScript Engineering
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden