• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: What is Best Method To Determine Duplicate Items in a Large List?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What is Best Method To Determine Duplicate Items in a Large List?


  • Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • From: "Nigel Garvey" <email@hidden>
  • Date: Sun, 13 Aug 2017 20:48:34 +0100

Jim Weisbin wrote on Sun, 13 Aug 2017 08:59:26 -0400:

>Jim Underwood <email@hidden> wrote:
>
>> Anyone have any bright ideas and/or tools to speed up identification
of
>>dup text items in a large list (~30,000)?
>
>I assume you are looking for for consecutive repeated words?
>
>It’s easy with egrep:
>
>egrep "\b([a-zA-Z0-9]+) \1\b” test.txt
>
>However, my attempts to do this with ASObjC using NSRegularExpression
have
>so far failed. I think the \b (word boundary) and \1 (repeated item)
have
>to be escaped, not sure.

This works for me:

  use AppleScript version "2.4" -- Yosemite (10.10) or later
  use framework "Foundation"

  set theText to current application's class "NSString"'s
stringWithString:("John, whereas Jim had had \"had\", had had \"had had\".
\"Had had\" had had the the teacher's teacher's approval.")

  set repeatedWordRegex to current application's class "NSRegularExpression"'s
regularExpressionWithPattern:("(?i)\\b([\\w']++)\\b \\b(\\1)\\b") options:(0)
|error|:(missing value)

  set repeatedWordMatches to repeatedWordRegex's matchesInString:(theText)
options:(0) range:({0, theText's |length|()})
  -- Each match's range() covers both words.
  -- Each match's rangeAtIndex:(1) locates the first word.
  -- Each match's rangeAtIndex:(2) locates the second.


NG
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Prev by Date: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Next by Date: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Previous by thread: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Next by thread: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Index(es):
    • Date
    • Thread