• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Coercing ligatures to expanded characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coercing ligatures to expanded characters


  • Subject: Re: Coercing ligatures to expanded characters
  • From: has <email@hidden>
  • Date: Sat, 14 Jan 2017 11:35:17 +0000

Shane Stanley wrote:
On 14 Jan 2017, at 3:47 am, has<email@hidden>  wrote:
Surprising though as it's exactly what ICUs transforms are designed to do.
It does handle the more common of the ligatures the OP listed:

set theString to current application's NSString's stringWithString:"fiffffifflflœᵫstæꜵꜨꜲ"
(theString's stringByApplyingTransform:"Latin-ASCII"|reverse|:false) as text
--> fiffffifflfloeuestaeꜵꜨꜲ

That's split all but the last three.

Doesn't help though. That transforms the subset of ligatures that fall into the 'Latin' category (ffl, fl, œ, etc) but leaves non-Latin ligatures alone. It also degrades non-ligature characters,
But perhaps of more interest is this:

considering case but ignoring diacriticals
	"ꜵꜨꜲ" = "aoTzAA"
end considering
--> true

Nope. That's Unicode working as advertized; comparing for logical meaning, not just raw comparing codepoints:

"ꜵꜨꜲ" = "aoTzAA" --> true


The `considering/ignoring diacriticals` option has no bearing here; that only affects how (e.g.) "ÅÈÏÓÛ" vs "AEIOU" are compared. You may be thinking of `considering/ignoring expansion`, which was a feature of AppleScript 1.x, but that got taken out when AS switched to full Unicode in 2.0 because Unicode has its own comparison rules, as demonstrated above.

That raises the question of what AppleScript is using.

Chris Nebel could tell you if you ask him (he does sometimes reply to emails/tweets). Or you could probably work it out if you black-box poke it with a stick until it reveals some "tells" that you recognize as limitations/quirks of a particular Unicode string handling library.

AS doesn't use what NSString uses, for example, because NSString is nasty old UCS2 that counts the number of raw codepoints, e.g. "é" may be reported as 1 or 2, depending on whether the underlying representation is composed (the Latin "é" glyph) or decomposed (ASCII "e" + "´" accent glyphs which are overlaid when displayed). Whereas AS always counts it as 1. Look into the old Carbon APIs that came over from OS9, as AS's Unicode capabilities either come from there or else from a 3rd-party project like ICU that was around at the time AS originally added Unicode support (AS 1.3.7?).


HTH

has

_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden


  • Follow-Ups:
    • Re: Coercing ligatures to expanded characters
      • From: Shane Stanley <email@hidden>
  • Prev by Date: Re: Finder Renaming Unpleasant Surprise
  • Next by Date: Re: Coercing ligatures to expanded characters
  • Previous by thread: Re: Coercing ligatures to expanded characters
  • Next by thread: Re: Coercing ligatures to expanded characters
  • Index(es):
    • Date
    • Thread