• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode Character in File Name
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode Character in File Name


  • Subject: Re: Unicode Character in File Name
  • From: Christopher Nebel <email@hidden>
  • Date: Tue, 7 Apr 2009 15:16:32 -0700

On Apr 7, 2009, at 8:29 AM, Doug McNutt wrote:

Can I depend on the Finder always displaying this character correctly in any language?

If it displays correctly at all, then it should be fine no matter what the user's locale is -- the whole point of Unicode is that it's a single character encoding that everyone world-wide uses, so there aren't any questions about what a particular byte means.

There are still questions about support of arbitrary unicode points by fonts that are installed in the user's computer. That's the reason for the "If it displays at all" caveat above.


I have a couple of questions that have not yet been answered in this thread.

Unicode has the concept of a "code point" or, to avoid confusion, codepoint. A codepoint was initially an unsigned 16 bit value but it has been extended to support up to 32 bits even though there are very few graphemes that use the extension at this time. In any case the concept of codepoint is well defined by those who claim to be the standards committee for unicode.

Actually, Unicode only defines 20 bits, up to U+10FFFF, but otherwise correct. (You may be thinking of UTF-32, which is a character encoding scheme that uses one 32-bit word per code point. Very easy to work with, but wastes at least 12 bits per code point, which is why you don't often see it.)


1) What is the official AppleScript term that describes a code point? The word "file" is used in AppleScript to declare the following lexical item to be an alias. What is the similar lexical item that declares something to be a codepoint? Is it "id", "character id", or something else?

There isn't one, exactly, but "character" comes close. "character id x" gets you the Unicode character with code point x. The trick is that because "character" elements of text are really combining character sequences, the reverse operation, "id of character y", is not a single code point for all values of y -- you may get a list of code points.


2) What is the same term for use while talking to application Finder? It appears that in some cases "character id" as sent to Finder is a command rather than a variable type. Lexical item "id" sounds an awful lot like a window id as used in Terminal.app.

The difficulty people were having here was that because of how object specifiers in commands are sent, Finder was getting sent the specifier "character id x", and didn't know what to do with it. Moving that bit outside the "tell" block meant that AppleScript, which did know what to do with it, got it. Finder probably isn't going to acquire this knowledge any time soon, so just move the statement or use "tell me".


3) "set mypoint to codepoint 1234" or "set mypoint to 1234 as codepoint" would make sense to an expert in unicode. Exactly what command lines would do that in AppleScript? Various posters have suggested a bunch of things but I still don't understand which one is politically correct.

"character id x" is what you want. Because of that business about combining character sequences, we considered adding a "code point" element to text, but decided that only hardcore Unicode geeks would even understand what it was, let alone care.


4) Does AppleScript support 32 bit codepoints? Finder?

For some definition of "32" (see above) and "support", yes. Anyone who supports Unicode at all probably handles the entire range, if for no other reason than they just use CF/NSString because it's easy. For example, try this:


	character id 119070  -- U+1D11E

You'll get a G-clef mark. Now, that just means that the character will get passed around accurately, not that it'll necessarily draw as anything other than a "missing character" glyph. As you pointed out above, in order to draw the code point, you need a font with that glyph, and that's hard to come by for some ranges. There is no single font that covers the entire Unicode range, and I believe there are even some sections of Unicode for which there are no commercial fonts available. Presumably they'll catch up eventually.


--Chris Nebel AppleScript Engineering

_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden
  • Follow-Ups:
    • Re: Unicode Character in File Name
      • From: "Mark J. Reed" <email@hidden>
References: 
 >Re: Unicode Character in File Name (From: CYB <email@hidden>)
 >Re: Unicode Character in File Name (From: Luther Fuller <email@hidden>)
 >Re: Unicode Character in File Name (From: "Mark J. Reed" <email@hidden>)
 >Re: Unicode Character in File Name (From: Luther Fuller <email@hidden>)
 >Re: Unicode Character in File Name (From: Christopher Nebel <email@hidden>)
 >Re: Unicode Character in File Name (From: Luther Fuller <email@hidden>)
 >Re: Unicode Character in File Name (From: Doug McNutt <email@hidden>)

  • Prev by Date: AppleWorks 6, AppleScript and prefs.
  • Next by Date: Re: Unicode Character in File Name
  • Previous by thread: Re: Unicode Character in File Name
  • Next by thread: Re: Unicode Character in File Name
  • Index(es):
    • Date
    • Thread