• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: What is Best Method To Determine Duplicate Items in a Large List?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What is Best Method To Determine Duplicate Items in a Large List?


  • Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • From: Shane Stanley <email@hidden>
  • Date: Mon, 14 Aug 2017 10:12:09 +1000

On 14 Aug 2017, at 5:48 am, Nigel Garvey <email@hidden> wrote:
>
> Even leaving out the minusSet: stage doesn't slow it down much.

FWIW, I did some tests closer to Jim's request: 30,000 entries, one match. Time
taken was around 0.1 seconds -- and about two-thirds of that was taken up by
the initial creation of an array from the list.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theBigList to current application's NSMutableArray's arrayWithCapacity:30000
repeat 15000 times
        set anEntry to current application's NSUUID's UUID()'s UUIDString()
        theBigList's addObject:anEntry
end repeat
repeat 15000 times
        set anotherEntry to current application's NSUUID's UUID()'s UUIDString()
        theBigList's addObject:anotherEntry
end repeat
theBigList's addObject:anEntry
set theBigList to theBigList as list

set time1 to current application's NSDate's |date|()

--set theBigList to {"a", "b", "c", "d", "e", "f", "g", "h", "a", "i", "e", "e"}
set theBigList to current application's NSArray's arrayWithArray:theBigList
set theCount to theBigList's |count|()
-- get a counted set of the duplicate instances of any duplicated values
set countedDupes to current application's NSCountedSet's setWithArray:theBigList
countedDupes's minusSet:(current application's NSSet's setWithSet:countedDupes)
-- get the indices of the duplicated values' first and dupe instances
set duplicatedValues to countedDupes's allObjects()
set indexInfo to {}
repeat with thisValue in duplicatedValues
        -- Value and first index.
        set thisIndex to (theBigList's indexOfObject:(thisValue)) + 1
        set thisInfo to {thisValue as text, thisIndex}
        -- Indices of dupes.
        repeat (countedDupes's countForObject:(thisValue)) times
                set thisIndex to (theBigList's indexOfObject:(thisValue)
inRange:({thisIndex, theCount - thisIndex})) + 1
                set end of thisInfo to thisIndex
        end repeat
        set end of indexInfo to thisInfo
end repeat
set time2 to time1's timeIntervalSinceNow()

return {indexInfo, -time2}


--
Shane Stanley <email@hidden>
<www.macosxautomation.com/applescript/apps/>, <latenightsw.com>


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: What is Best Method To Determine Duplicate Items in a Large List?
      • From: Jim Underwood <email@hidden>
References: 
 >Re: What is Best Method To Determine Duplicate Items in a Large List? (From: "Nigel Garvey" <email@hidden>)

  • Prev by Date: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Next by Date: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Previous by thread: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Next by thread: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Index(es):
    • Date
    • Thread