• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Coercing ligatures to expanded characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coercing ligatures to expanded characters


  • Subject: Re: Coercing ligatures to expanded characters
  • From: Shane Stanley <email@hidden>
  • Date: Sat, 14 Jan 2017 22:56:33 +1100

On 14 Jan 2017, at 10:35 pm, has <email@hidden> wrote:

That's Unicode working as advertized; comparing for logical meaning, not just raw comparing codepoints:

"ꜵꜨꜲ" = "aoTzAA" --> true

But that's a different result to:

set theString to current application's NSString's stringWithString:"ꜵꜨꜲ"
theString's compare:"aoTzAA" options:(current application's NSDiacriticInsensitiveSearch)


The 10.6 AS Release Notes say:

The various types of ignoring behavior for text comparisons are now defined using Unicode General Categories, not ASCII characters:
  • ignoring punctuation ignores category P*: for example, left- and right-quotation marks are now ignored. However, the backtick character (`) used to be ignored but is now considered, because Unicode classifies it as a symbol, not punctuation.
  • ignoring hyphens ignores category Pd: for example, em- and en-dashes are now ignored.
  • ignoring whitespace ignores category Z*, plus tab (\t), return (\r), and linefeed (\n): for example, non-breaking spaces are now ignored.
For further details on General Categories, see the Unicode Standard, section 4.5. [4819817]
No mention of diacritcals.

AS doesn't use what NSString uses, for example, because NSString is nasty old UCS2 that counts the number of raw codepoints, e.g. "é" may be reported as 1 or 2, depending on whether the underlying representation is composed (the Latin "é" glyph) or decomposed (ASCII "e" + "´" accent glyphs which are overlaid when displayed). Whereas AS always counts it as 1. Look into the old Carbon APIs that came over from OS9, as AS's Unicode capabilities either come from there or else from a 3rd-party project like ICU that was around at the time AS originally added Unicode support (AS 1.3.7?).

It counts grapheme clusters, but that doesn't mean it doesn't use NSString (probably CFString). There's no reason they can't be using CFString and the equivalent of -enumerateSubstringsInRange:: with NSStringEnumerationByComposedCharacterSequences.  Last tests I did weren't exhaustive, but they gave the same results as AS in various scenarios of composition.

-- 
Shane Stanley <email@hidden>
<www.macosxautomation.com/applescript/apps/>, <latenightsw.com>


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: Coercing ligatures to expanded characters
      • From: Shane Stanley <email@hidden>
References: 
 >Re: Coercing ligatures to expanded characters (From: has <email@hidden>)

  • Prev by Date: Re: Coercing ligatures to expanded characters
  • Next by Date: Re: Coercing ligatures to expanded characters
  • Previous by thread: Re: Coercing ligatures to expanded characters
  • Next by thread: Re: Coercing ligatures to expanded characters
  • Index(es):
    • Date
    • Thread