• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Coercing ligatures to expanded characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coercing ligatures to expanded characters


  • Subject: Re: Coercing ligatures to expanded characters
  • From: has <email@hidden>
  • Date: Sat, 14 Jan 2017 11:35:17 +0000

Shane Stanley wrote:
On 14 Jan 2017, at 3:47 am, has<email@hidden>  wrote:
Surprising though as it's exactly what ICUs transforms are designed to do.
It does handle the more common of the ligatures the OP listed:

set theString to current application's NSString's stringWithString:"fiffffifflflœᵫstæꜵꜨꜲ"
(theString's stringByApplyingTransform:"Latin-ASCII"|reverse|:false) as text
--> fiffffifflfloeuestaeꜵꜨꜲ

That's split all but the last three.
Doesn't help though. That transforms the subset of ligatures that fall
into the 'Latin' category (ffl, fl, œ, etc) but leaves non-Latin ligatures
alone. It also degrades non-ligature characters,
But perhaps of more interest is this:

considering case but ignoring diacriticals
	"ꜵꜨꜲ" = "aoTzAA"
end considering
--> true
Nope. That's Unicode working as advertized; comparing for logical
meaning, not just raw comparing codepoints:
"ꜵꜨꜲ" = "aoTzAA" --> true


The `considering/ignoring diacriticals` option has no bearing here; that only affects how (e.g.) "ÅÈÏÓÛ" vs "AEIOU" are compared. You may be thinking of `considering/ignoring expansion`, which was a feature of AppleScript 1.x, but that got taken out when AS switched to full Unicode in 2.0 because Unicode has its own comparison rules, as demonstrated above.
That raises the question of what AppleScript is using.
Chris Nebel could tell you if you ask him (he does sometimes reply to
emails/tweets). Or you could probably work it out if you black-box poke
it with a stick until it reveals some "tells" that you recognize as
limitations/quirks of a particular Unicode string handling library.
AS doesn't use what NSString uses, for example, because NSString is
nasty old UCS2 that counts the number of raw codepoints, e.g. "é" may be
reported as 1 or 2, depending on whether the underlying representation
is composed (the Latin "é" glyph) or decomposed (ASCII "e" + "´" accent
glyphs which are overlaid when displayed). Whereas AS always counts it
as 1. Look into the old Carbon APIs that came over from OS9, as AS's
Unicode capabilities either come from there or else from a 3rd-party
project like ICU that was around at the time AS originally added Unicode
support (AS 1.3.7?).

HTH

has

_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden


  • Follow-Ups:
    • Re: Coercing ligatures to expanded characters
      • From: Shane Stanley <email@hidden>
  • Prev by Date: Re: Finder Renaming Unpleasant Surprise
  • Next by Date: Re: Coercing ligatures to expanded characters
  • Previous by thread: Re: Coercing ligatures to expanded characters
  • Next by thread: Re: Coercing ligatures to expanded characters
  • Index(es):
    • Date
    • Thread