Re: Accented Characters in Finder Names
Re: Accented Characters in Finder Names
- Subject: Re: Accented Characters in Finder Names
- From: Christopher Nebel <email@hidden>
- Date: Fri, 26 Dec 2003 14:38:41 -0800
Paul's explanation is essentially correct, except that it's not
AppleScript insisting on decomposed characters, but the file system.
HFS+ stores everything decomposed (or tries to -- you can finagle this
using shell commands, but I don't recommend it); AppleScript doesn't
really care one way or the other, but tries to preserve whatever it was
given.
Part of the problem here is that "characters" of Unicode text aren't
really what most people think of as characters -- they're Unicode code
points, so the "a" and the combining circumflex mark show up as
separate "characters", which humans tend to find surprising. I hope to
straighten this out in the future.
--Chris Nebel
AppleScript Engineering
On Dec 19, 2003, at 7:43 AM, Paul Berkowitz wrote:
There are many silent coercions between Unicode and string to allow us
easy correspondence between the two when using MacRoman. It's been
well done and
we don't notice it most of the time. 'name' property of 'info for' an
alias is Unicode text, and your AppleScript text item delimiters are
presumably the default {""}. That acts as if it were {"" as Unicode
text} to keep the individual text items in Unicode, not coerce them to
string. If you are using a script editor which indicates the data
type, such as Script Debugger, you will see that each character of the
resulting list is still in Unicode.
Unicode accented characters come, unfortunately, in two varieties:
"pre-composed" and "decomposed". Someone will undoubtedly explain this
better than I, but it seems that the better method is "decomposed",
separated into its two parts - character plus diacritic - since it
allows any combination you could ever come up with. AppleScript uses
this method in
Unicode. But many countries who had long used a "precomposed" version
- one single character containing both character and accent - in their
own character formats for years pleaded to be allowed to do the same
in Unicode. So for many of them - in particular Western European
varieties of accented characters - precomposed versions exist in
Unicode too. Take a look through the Unicode Input palette and you can
come upon both.
As I understand it, 'as string' will get you the precomposed versions
of the characters in AppleScript where the characters happen to have
MacRoman versions. Unicode transcripts of text, a few versions of
AppleScript back, used to show strange spaces between each MacRoman
character transliterated into Unicode, like so:
"S t u d i o ! : S t u d i o F i l e s : "
so that 'text items' using {""} as delimiter would get you;
{"S", " ", "t", " ", "u", " ", "d", " ", "i", " ", "o", " ", "!", "
",
":", " ", "S", " ", "t", " ", "u", " ", "d", " ", "i", " ", "o", " ",
" ", "
", "F", " ", "i", " ", "l", " ", "e", " ", "s", " ", ":", " "}
It's much nicer that AppleScript now displays that as:
{"S", "t", "u", "d", "i", "o", "!", ":", "S", "t", "u", "d", "i",
"o", "
", "F", "i", "l", "e", "s", ":"}
where each character is a Unicode character.
But when you got a genuine 2-byte character such as that de-composed
a-with-circumflex, each byte is getting its own representation.
Perhaps it could be coerced to the single pre-composed
a-with-circumflex character, but that would not actually be correct
insofar as it would be substituting one Unicode two-byte character for
another. When you explicitly coerce to string first, then you'll get
the single accented MacRoman character.
I'm not sure that they cam do it any other way and still keep the real
Unicode characters. Perhaps when the various script editors (including
Script Editor 2.0) can actually decompile to a real Unicode display
and input, there might be another way to do this.
On 12/19/03 3:38 AM, "Mr Tea" <email@hidden> wrote:
Can anyone help me understand what's going on here?
set itm to alias "Studio!:Studio Files:Test Zone:Store:Chbteau 2.jpg"
set N to name of (info for itm)
set tl to every text item of N
--> {"C", "h", "a", "", "t", "e", "a", "u", " ", "2", ".", "j", "p",
"g"}
Why is the 'b' ('a' + circumflex) split into two separate characters?
I stumbled into this anomaly while experimenting with the 'Change
Case in
Item Names' script supplied with OS X (located in /Library/Scripts).
The
script died when it encountered the b, so I opened it up to find out
why.
The upper and lower case alphabet strings used in the script do not
contain
any accented letters, so I added them and ran the script again. Still
no
joy. Stepping through the script, I arrived at the error message
'Can't make
"^" into a string' (duh! you just did!).
The trick to get round this issue, apparently, is to convert the
value of n
into a string before separating it into text items (or characters).
That
reunites the 'a' with its circumflex and works as expected.
My guess is that it's something to do with the differences between
Unicode
text and plain old strings, but that's a bit like saying that I
reckon the
noise coming from under my bonnet/hood is something to do with the
alternator. I might be right, but I still don't really know what the
hell
I'm talking about.
(It don't take a rocket scientist, however, to see that this 'problem
with
accents' could make working with Finder item names unnecessarily
complex for
many users in mainland Europe.)
And why can't applescript convert ASCII character 246 into a string
in this
particular context?
Confusedly,
Nick
pp Mr Tea
(Wishing that he hadn't wasted everyone's time yesterday by posting
that
sad, deluded 'OS X 10.3.2 just out!' message when it was already old
news.
Regression is setting in.)
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.