Re: Coercing ligatures to expanded characters
Re: Coercing ligatures to expanded characters
- Subject: Re: Coercing ligatures to expanded characters
- From: has <email@hidden>
- Date: Sat, 14 Jan 2017 11:35:17 +0000
Shane Stanley wrote:
On 14 Jan 2017, at 3:47 am, has<email@hidden> wrote:
Surprising though as it's exactly what ICUs transforms are designed to do.
It does handle the more common of the ligatures the OP listed:
set theString to current application's NSString's stringWithString:"fiffffifflflœᵫstæꜵꜨꜲ"
(theString's stringByApplyingTransform:"Latin-ASCII"|reverse|:false) as text
--> fiffffifflfloeuestaeꜵꜨꜲ
That's split all but the last three.
Doesn't help though. That transforms the subset of ligatures that fall
into the 'Latin' category (ffl, fl, œ, etc) but leaves non-Latin ligatures
alone. It also degrades non-ligature characters,
But perhaps of more interest is this:
considering case but ignoring diacriticals
"ꜵꜨꜲ" = "aoTzAA"
end considering
--> true
Nope. That's Unicode working as advertized; comparing for logical
meaning, not just raw comparing codepoints:
"ꜵꜨꜲ" = "aoTzAA" --> true
The `considering/ignoring diacriticals` option has no bearing here; that
only affects how (e.g.) "ÅÈÏÓÛ" vs "AEIOU" are compared. You may be
thinking of `considering/ignoring expansion`, which was a feature of
AppleScript 1.x, but that got taken out when AS switched to full Unicode
in 2.0 because Unicode has its own comparison rules, as demonstrated above.
That raises the question of what AppleScript is using.
Chris Nebel could tell you if you ask him (he does sometimes reply to
emails/tweets). Or you could probably work it out if you black-box poke
it with a stick until it reveals some "tells" that you recognize as
limitations/quirks of a particular Unicode string handling library.
AS doesn't use what NSString uses, for example, because NSString is
nasty old UCS2 that counts the number of raw codepoints, e.g. "é" may be
reported as 1 or 2, depending on whether the underlying representation
is composed (the Latin "é" glyph) or decomposed (ASCII "e" + "´" accent
glyphs which are overlaid when displayed). Whereas AS always counts it
as 1. Look into the old Carbon APIs that came over from OS9, as AS's
Unicode capabilities either come from there or else from a 3rd-party
project like ICU that was around at the time AS originally added Unicode
support (AS 1.3.7?).
HTH
has
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden