• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Coercing ligatures to expanded characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coercing ligatures to expanded characters


  • Subject: Re: Coercing ligatures to expanded characters
  • From: Jacob Small <email@hidden>
  • Date: Thu, 12 Jan 2017 14:45:23 -0500

Couldn't seem to make it work that way, but to be honest my experience with both grep and regular expressions in general is limited.

But the following subroutine seems to do the trick. Took less time than I thought. Is there an easier way to do this?

on removeLigaturesFromString(inputStringWithLigatures)

set characterList to characters of inputStringWithLigatures

set charactersWithoutLigatures to {}

repeat with char from 1 to count of characterList

if item char of characterList is equal to "Ꜳ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"A"}

else if item char of characterList is equal to "ꜳ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"a"}

else if item char of characterList is equal to "Æ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"E"}

else if item char of characterList is equal to "æ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"e"}

else if item char of characterList is equal to "Ꜵ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"O"}

else if item char of characterList is equal to "ꜵ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"o"}

else if item char of characterList is equal to "Ꜷ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"U"}

else if item char of characterList is equal to "ꜷ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"u"}

else if item char of characterList is equal to "Ꜹ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"V"}

else if item char of characterList is equal to "ꜹ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"v"}

else if item char of characterList is equal to "Ꜻ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"V"}

else if item char of characterList is equal to "ꜻ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"v"}

else if item char of characterList is equal to "Ꜽ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"A"} & {"Y"}

else if item char of characterList is equal to "ꜽ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"a"} & {"y"}

else if item char of characterList is equal to "ff" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"f"}

else if item char of characterList is equal to "ffi" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"f"} & {"i"}

else if item char of characterList is equal to "ffl" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"f"} & {"l"}

else if item char of characterList is equal to "fi" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"i"}

else if item char of characterList is equal to "fl" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"f"} & {"l"}

else if item char of characterList is equal to "Œ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"O"} & {"E"}

else if item char of characterList is equal to "œ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"o"} & {"e"}

else if item char of characterList is equal to "Ꝏ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"O"} & {"O"}

else if item char of characterList is equal to "st" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"s"} & {"t"}

else if item char of characterList is equal to "Ꜩ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"T"} & {"Z"}

else if item char of characterList is equal to "ꜩ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"t"} & {"z"}

else if item char of characterList is equal to "ᵫ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"u"} & {"e"}

else if item char of characterList is equal to "Ꝡ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"V"} & {"Y"}

else if item char of characterList is equal to "ꝡ" then

set charactersWithoutLigatures to charactersWithoutLigatures & {"v"} & {"y"}

else

set charactersWithoutLigatures to charactersWithoutLigatures & item char of characterList

end if

end repeat

set stringWithLigaturesRemoved to ((items of charactersWithoutLigatures) as string)

return stringWithLigaturesRemoved

end removeLigaturesFromString


removeLigaturesFromString(inputStringWithLigatures)


On Thu, Jan 12, 2017 at 2:14 PM, Denis Boyd <email@hidden> wrote:

is there any reason not to add the ligature to the grep search string?

do shell script "echo '" & (characters of inputString) & "' >" & quoted form of (filepath & "grepTarget.txt")

set theResult to paragraphs of (do shell script "grep -E -o '\\b[A-Za-zÆæŒœ0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}\\b' " & quoted form of (filepath & "grepTarget.txt"))


worked here.

Best,
Denis

On 12/01/2017 18:49, Jacob Small wrote:
Well, replacing the ligatures appears to me to be the best solution, but I'm not finding anything, so here's the root problem.

The following script is supposed to pull email addresses out of strings:

do shell script "echo '" & (characters of inputString) & "' >" & quoted form of (filepath & "grepTarget.txt")

set theResult to paragraphs of (do shell script "grep -E -o '\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}\\b' " & quoted form of (filepath & "grepTarget.txt"))


this is hanging up on a ligature in the domain name of an email address. Any thoughts on a way to address this?


Sincerely,


Jacob


On Thu, Jan 12, 2017 at 12:39 PM, Jacob M. Small <email@hidden> wrote:
What I'm looking for:

on removeLigaturesFromString(inputStringWithLigatures)
    set stringWithLigaturesRemoved to someFunctionThatReplacesLigatures()
    return stringWithLigaturesRemoved
end removeLigaturesFromString

Jacob M. Small wrote:

I need to find a way to coerce all ligatures out of strings. Does
anyone know a method to do this or have a link to something ready-made?

Sincerely,

Jacob
--
Jacob M. Small, Principal
J. Madison PLC
1750 Tysons Boulevard, Suite 1500
McLean, Virginia 22102
T 703.910.5062 F 703.910.5107
www.jmadisonplc.com

This email may contain confidential information or information
protected by the attorney-client privilege. If you have received this
email by mistake, please notify me and then delete the email.
<https://www.postbox-inc.com/?utm_source=email&utm_medium=siglink&utm_campaign=reach>


Jacob M. Small
January 12, 2017 at 12:28 PM
I need to find a way to coerce all ligatures out of strings. Does anyone know a method to do this or have a link to something ready-made?

Sincerely,

Jacob

--
Jacob M. Small, Principal
J. Madison PLC
1750 Tysons Boulevard, Suite 1500
McLean, Virginia 22102

T 703.910.5062 F 703.910.5107
www.jmadisonplc.com

This email may contain confidential information or information protected by the attorney-client privilege. If you have received this email by mistake, please notify me and then delete the email.



 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (applescript-users@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: Coercing ligatures to expanded characters
      • From: Shane Stanley <email@hidden>
References: 
 >Coercing ligatures to expanded characters (From: "Jacob M. Small" <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: "Jacob M. Small" <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Jacob Small <email@hidden>)
 >Re: Coercing ligatures to expanded characters (From: Denis Boyd <email@hidden>)

  • Prev by Date: Re: Coercing ligatures to expanded characters
  • Next by Date: Re: Coercing ligatures to expanded characters
  • Previous by thread: Re: Coercing ligatures to expanded characters
  • Next by thread: Re: Coercing ligatures to expanded characters
  • Index(es):
    • Date
    • Thread