Re: Accented Characters in Finder Names
Re: Accented Characters in Finder Names
- Subject: Re: Accented Characters in Finder Names
- From: Paul Berkowitz <email@hidden>
- Date: Fri, 19 Dec 2003 07:43:06 -0800
This discussion is going to be very awkward because of this despicable mail
server configuration which simply won't allow us to use non-ASCII characters
where a significant part of what AppleScript Users need to do on a Mailing
List involved just that.
There are many silent coercions between Unicode and string to allow us easy
correspondence between the two when using MacRoman. It's been well done and
we don't notice it most of the time. 'name' property of 'info for' an alias
is Unicode text, and your AppleScript text item delimiters are presumably
the default {""}. That acts as if it were {"" as Unicode text} to keep the
individual text items in Unicode, not coerce them to string. If you are
using a script editor which indicates the data type, such as Script
Debugger, you will see that each character of the resulting list is still in
Unicode.
Unicode accented characters come, unfortunately, in two varieties:
"pre-composed" and "decomposed". Someone will undoubtedly explain this
better than I, but it seems that the better method is "decomposed",
separated into its two parts - character plus diacritic - since it allows
any combination you could ever come up with. AppleScript uses this method in
Unicode. But many countries who had long used a "precomposed" version - one
single character containing both character and accent - in their own
character formats for years pleaded to be allowed to do the same in Unicode.
So for many of them - in particular Western European varieties of accented
characters - precomposed versions exist in Unicode too. Take a look through
the Unicode Input palette and you can come upon both.
As I understand it, 'as string' will get you the precomposed versions of the
characters in AppleScript where the characters happen to have MacRoman
versions. Unicode transcripts of text, a few versions of AppleScript back,
used to show strange spaces between each MacRoman character transliterated
into Unicode, like so:
"S t u d i o ! : S t u d i o F i l e s : "
so that 'text items' using {""} as delimiter would get you;
{"S", " ", "t", " ", "u", " ", "d", " ", "i", " ", "o", " ", "!", " ",
":", " ", "S", " ", "t", " ", "u", " ", "d", " ", "i", " ", "o", " ", " ", "
", "F", " ", "i", " ", "l", " ", "e", " ", "s", " ", ":", " "}
It's much nicer that AppleScript now displays that as:
{"S", "t", "u", "d", "i", "o", "!", ":", "S", "t", "u", "d", "i", "o", "
", "F", "i", "l", "e", "s", ":"}
where each character is a Unicode character.
But when you hot a genuine 2-byte character such as that de-composed
a-with-circumflex, each byte is getting its own representation. Perhaps it
could be coerced to the single pre-composed a-with-circumflex character, but
that would not actually be correct insofar as it would be substituting one
Unicode two-byte character for another. When you explicitly coerce to string
first, then you'll get the single accented MacRoman character.
I'm not sure that they cam do it any other way and still keep the real
Unicode characters. Perhaps when the various script editors (including
Script Editor 2.0) can actually decompile to a real Unicode display and
input, there might be another way to do this.
Your direct question at the end:
>
And why can't applescript convert ASCII character 246 into a string in this
>
particular context?
is asking the wrong question, AppleScript _can_ display ASCII character 246
just fine (just this mailing list can't). But the character in question is
not an ASCII character, it's a Unicode character. And it's _not_ the Unicode
equivalent - 00E2 - LATIN SMALL LETTER A WITH CIRCUMFLEX - of ASCII 246
(MacRoman) - but rather the decomposed character made up of 0061 -LATIN
SMALL LETTER A - and 0302 - COMBINING CIRCUMFLEX ACCENT.
Perhaps Chris can explain it more accurately.
--
Paul Berkowitz
On 12/19/03 3:38 AM, "Mr Tea" <email@hidden> wrote:
>
Can anyone help me understand what's going on here?
>
>
set itm to alias "Studio!:Studio Files:Test Zone:Store:Chbteau 2.jpg"
>
set N to name of (info for itm)
>
set tl to every text item of N
>
>
--> {"C", "h", "a", "", "t", "e", "a", "u", " ", "2", ".", "j", "p", "g"}
>
>
Why is the 'b' ('a' + circumflex) split into two separate characters?
>
>
I stumbled into this anomaly while experimenting with the 'Change Case in
>
Item Names' script supplied with OS X (located in /Library/Scripts). The
>
script died when it encountered the b, so I opened it up to find out why.
>
>
The upper and lower case alphabet strings used in the script do not contain
>
any accented letters, so I added them and ran the script again. Still no
>
joy. Stepping through the script, I arrived at the error message 'Can't make
>
"^" into a string' (duh! you just did!).
>
>
The trick to get round this issue, apparently, is to convert the value of n
>
into a string before separating it into text items (or characters). That
>
reunites the 'a' with its circumflex and works as expected.
>
>
My guess is that it's something to do with the differences between Unicode
>
text and plain old strings, but that's a bit like saying that I reckon the
>
noise coming from under my bonnet/hood is something to do with the
>
alternator. I might be right, but I still don't really know what the hell
>
I'm talking about.
>
>
(It don't take a rocket scientist, however, to see that this 'problem with
>
accents' could make working with Finder item names unnecessarily complex for
>
many users in mainland Europe.)
>
>
And why can't applescript convert ASCII character 246 into a string in this
>
particular context?
>
>
>
Confusedly,
>
>
>
Nick
>
pp Mr Tea
>
>
(Wishing that he hadn't wasted everyone's time yesterday by posting that
>
sad, deluded 'OS X 10.3.2 just out!' message when it was already old news.
>
Regression is setting in.)
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.