Re: Unicode Character in File Name
Re: Unicode Character in File Name
- Subject: Re: Unicode Character in File Name
- From: "Mark J. Reed" <email@hidden>
- Date: Tue, 7 Apr 2009 19:26:40 -0400
So I assume that the literal version "character id 65" is resolved at
compile time, so there's no runtime event for the Finder to intercept?
I'm still confused about why an object specifier that doesn't match
the Finder's dictionary would get sent to the Finder, though..
On 4/7/09, Christopher Nebel <email@hidden> wrote:
> On Apr 7, 2009, at 8:29 AM, Doug McNutt wrote:
>
>>>>> Can I depend on the Finder always displaying this character
>>>>> correctly in any language?
>>>>
>>>> If it displays correctly at all, then it should be fine no matter
>>>> what the user's locale is -- the whole point of Unicode is that
>>>> it's a single character encoding that everyone world-wide uses,
>>>> so there aren't any questions about what a particular byte means.
>>
>> There are still questions about support of arbitrary unicode points
>> by fonts that are installed in the user's computer. That's the
>> reason for the "If it displays at all" caveat above.
>>
>> I have a couple of questions that have not yet been answered in this
>> thread.
>>
>> Unicode has the concept of a "code point" or, to avoid confusion,
>> codepoint. A codepoint was initially an unsigned 16 bit value but
>> it has been extended to support up to 32 bits even though there are
>> very few graphemes that use the extension at this time. In any case
>> the concept of codepoint is well defined by those who claim to be
>> the standards committee for unicode.
>
> Actually, Unicode only defines 20 bits, up to U+10FFFF, but otherwise
> correct. (You may be thinking of UTF-32, which is a character
> encoding scheme that uses one 32-bit word per code point. Very easy
> to work with, but wastes at least 12 bits per code point, which is why
> you don't often see it.)
>
>> 1) What is the official AppleScript term that describes a code
>> point? The word "file" is used in AppleScript to declare the
>> following lexical item to be an alias. What is the similar lexical
>> item that declares something to be a codepoint? Is it "id",
>> "character id", or something else?
>
> There isn't one, exactly, but "character" comes close. "character id
> x" gets you the Unicode character with code point x. The trick is
> that because "character" elements of text are really combining
> character sequences, the reverse operation, "id of character y", is
> not a single code point for all values of y -- you may get a list of
> code points.
>
>> 2) What is the same term for use while talking to application
>> Finder? It appears that in some cases "character id" as sent to
>> Finder is a command rather than a variable type. Lexical item "id"
>> sounds an awful lot like a window id as used in Terminal.app.
>
> The difficulty people were having here was that because of how object
> specifiers in commands are sent, Finder was getting sent the specifier
> "character id x", and didn't know what to do with it. Moving that bit
> outside the "tell" block meant that AppleScript, which did know what
> to do with it, got it. Finder probably isn't going to acquire this
> knowledge any time soon, so just move the statement or use "tell me".
>
>> 3) "set mypoint to codepoint 1234" or "set mypoint to 1234 as
>> codepoint" would make sense to an expert in unicode. Exactly what
>> command lines would do that in AppleScript? Various posters have
>> suggested a bunch of things but I still don't understand which one
>> is politically correct.
>
> "character id x" is what you want. Because of that business about
> combining character sequences, we considered adding a "code point"
> element to text, but decided that only hardcore Unicode geeks would
> even understand what it was, let alone care.
>
>> 4) Does AppleScript support 32 bit codepoints? Finder?
>
> For some definition of "32" (see above) and "support", yes. Anyone
> who supports Unicode at all probably handles the entire range, if for
> no other reason than they just use CF/NSString because it's easy. For
> example, try this:
>
> character id 119070 -- U+1D11E
>
> You'll get a G-clef mark. Now, that just means that the character
> will get passed around accurately, not that it'll necessarily draw as
> anything other than a "missing character" glyph. As you pointed out
> above, in order to draw the code point, you need a font with that
> glyph, and that's hard to come by for some ranges. There is no single
> font that covers the entire Unicode range, and I believe there are
> even some sections of Unicode for which there are no commercial fonts
> available. Presumably they'll catch up eventually.
>
>
> --Chris Nebel
> AppleScript Engineering
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> AppleScript-Users mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> Archives: http://lists.apple.com/archives/applescript-users
>
> This email sent to email@hidden
>
--
Sent from my mobile device
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden