• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Coercing ligatures to expanded characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coercing ligatures to expanded characters


  • Subject: Re: Coercing ligatures to expanded characters
  • From: "Jacob M. Small" <email@hidden>
  • Date: Mon, 16 Jan 2017 09:58:06 -0500

Thanks to everyone who helped out with this request, and for the illuminating discussion afterward. Has cautioned me against attempting to build an applescript unicode text normalizer. There's zero risk of that. I just need my script to work!

Thanks again!

Shane Stanley wrote:

On 13 Jan 2017, at 6:25 pm, Takaaki Naganoya<email@hidden>  wrote:

Shane, the variable “theContent” in the second line is not right. It is “someString”.


Mea culpa. Thanks for the correction!


Shane Stanley
January 13, 2017 at 2:28 AM

Mea culpa. Thanks for the correction!

Takaaki Naganoya
January 13, 2017 at 2:25 AM
Shane, the variable “theContent” in the second line is not right. It is “someString”.

<AppleScript>
use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

on findEmailAddressesIn:someString
set theDD to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
set theURLs to theDD's matchesInString:someString options:0 range:{location:0, |length|:length of someString}
set thePredicate to current application's NSPredicate's predicateWithFormat:"self.URL.scheme == 'mailto'"
set theURLs to theURLs's filteredArrayUsingPredicate:thePredicate
set theURLs to theURLs's valueForKeyPath:"URL.resourceSpecifier"
set theURLs to (current application's NSSet's setWithArray:theURLs)'s allObjects()
return theURLs as list
end findEmailAddressesIn:
</AppleScript>

--
Takaaki Naganoya
email@hidden
http://piyocast.com/as/



_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update http://lists.apple.com/archives/applescript-users

This email sent to email@hidden
Shane Stanley
January 12, 2017 at 7:10 PM
On 13 Jan 2017, at 6:45 am, Jacob Small <email@hidden> wrote:

There are more efficient ways. For example:

on removeLigaturesFromString(inputStringWithLigatures)
set searchStrings to {"Ꜳ", "ꜳ"} -- add the rest here
set replaceStrings to {"AA", "aa"} -- add the rest here
set saveTID to AppleScript's text item delimiters
considering case
repeat with i from 1 to count of searchStrings
set AppleScript's text item delimiters to {item i of searchStrings}
set inputStringWithLigatures to text items of inputStringWithLigatures
set AppleScript's text item delimiters to {item i of replaceStrings}
set inputStringWithLigatures to inputStringWithLigatures as text
end repeat
end considering
set AppleScript's text item delimiters to saveTID
return replaceStrings
end removeLigaturesFromString

Here's an alternative for extracting the email addresses:

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"

on findEmailAddressesIn:someString
set theDD to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
set theURLs to theDD's matchesInString:someString options:0 range:{location:0, |length|:length of theContent}
set thePredicate to current application's NSPredicate's predicateWithFormat:"self.URL.scheme == 'mailto'"
set theURLs to theURLs's filteredArrayUsingPredicate:thePredicate
set theURLs to theURLs's valueForKeyPath:"URL.resourceSpecifier"
set theURLs to (current application's NSSet's setWithArray:theURLs)'s allObjects()
return theURLs as list
end findEmailAddressesIn:



-- 
Shane Stanley <email@hidden>
<www.macosxautomation.com/applescript/apps/>, <latenightsw.com>


_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update http://lists.apple.com/archives/applescript-users

This email sent to email@hidden
Jacob Small
January 12, 2017 at 2:45 PM
Couldn't seem to make it work that way, but to be honest my experience with both grep and regular expressions in general is limited.

But the following subroutine seems to do the trick. Took less time than I thought. Is there an easier way to do this?

on removeLigaturesFromString(inputStringWithLigatures)

set characterList to characters of inputStringWithLigatures

set charactersWithoutLigatures to {}

repeat with char from 1 to count of characterList

if item char of characterList is equal to "Ꜳ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"A"}

else if item char of characterList is equal to "ꜳ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"a"}

else if item char of characterList is equal to "Æ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"E"}

else if item char of characterList is equal to "æ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"e"}

else if item char of characterList is equal to "Ꜵ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"O"}

else if item char of characterList is equal to "ꜵ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"o"}

else if item char of characterList is equal to "Ꜷ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"U"}

else if item char of characterList is equal to "ꜷ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"u"}

else if item char of characterList is equal to "Ꜹ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"V"}

else if item char of characterList is equal to "ꜹ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"v"}

else if item char of characterList is equal to "Ꜻ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"V"}

else if item char of characterList is equal to "ꜻ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"v"}

else if item char of characterList is equal to "Ꜽ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"Y"}

else if item char of characterList is equal to "ꜽ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"y"}

else if item char of characterList is equal to "ff" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"f"}

else if item char of characterList is equal to "ffi" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"f"} & {"i"}

else if item char of characterList is equal to "ffl" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"f"} & {"l"}

else if item char of characterList is equal to "fi" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"i"}

else if item char of characterList is equal to "fl" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"l"}

else if item char of characterList is equal to "Œ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"O"} & {"E"}

else if item char of characterList is equal to "œ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"o"} & {"e"}

else if item char of characterList is equal to "Ꝏ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"O"} & {"O"}

else if item char of characterList is equal to "st" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"s"} & {"t"}

else if item char of characterList is equal to "Ꜩ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"T"} & {"Z"}

else if item char of characterList is equal to "ꜩ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"t"} & {"z"}

else if item char of characterList is equal to "ᵫ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"u"} & {"e"}

else if item char of characterList is equal to "Ꝡ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"V"} & {"Y"}

else if item char of characterList is equal to "ꝡ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"v"} & {"y"}

else

set charactersWithoutLigatures to charactersWithoutLigatures & item char of characterList

end if

end repeat

set stringWithLigaturesRemoved to ((items of charactersWithoutLigatures) as string)

return stringWithLigaturesRemoved

end removeLigaturesFromString


removeLigaturesFromString(inputStringWithLigatures)



Denis Boyd
January 12, 2017 at 2:14 PM

is there any reason not to add the ligature to the grep search string?

do shell script "echo '" & (characters of inputString) & "' >" & quoted form of (filepath & "grepTarget.txt")

set theResult to paragraphs of (do shell script "grep -E -o '\\b[A-Za-zÆæŒœ0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}\\b' " & quoted form of (filepath & "grepTarget.txt"))


worked here.

Best,
Denis
On 12/01/2017 18:49, Jacob Small wrote:


--
Sent from Postbox
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: Coercing ligatures to expanded characters
      • From: Yvan KOENIG <email@hidden>
References: 
 >Coercing ligatures to expanded characters (From: "Jacob M. Small" <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: "Jacob M. Small" <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Jacob Small <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Denis Boyd <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Jacob Small <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Shane Stanley <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Takaaki Naganoya <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Shane Stanley <email@hidden>)

  • Prev by Date: Re: Asking a favour, please?
  • Next by Date: RE: AppleScript-Users Digest, Vol 14, Issue 49
  • Previous by thread: Re: Coercing ligatures to expanded characters
  • Next by thread: Re: Coercing ligatures to expanded characters
  • Index(es):
    • Date
    • Thread