Re: What is Best Method To Determine Duplicate Items in a Large List?
Re: What is Best Method To Determine Duplicate Items in a Large List?
- Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
- From: Jim Underwood <email@hidden>
- Date: Mon, 14 Aug 2017 03:36:06 +0000
- Thread-topic: What is Best Method To Determine Duplicate Items in a Large List?
Re: What is Best Method To Determine Duplicate Items in a Large
List?<http://lists.apple.com/archives/applescript-users/2017/Aug/msg00053.html>
eMail Sent: 2017-08-13 07:12 PM CDT
Shane, one minor request: Is it possible to retain the order of the source
list (theBigList) in the resulting dup list (indexInfo)?
My source list is Note Titles that I have sorted in alphanumeric order, but the
script seems to ignore this.
Thanks again.
Best Regards,
Jim Underwood
aka JMichaelTX
From: AppleScript-Users
<applescript-users-bounces+jmichael=email@hidden<mailto:applescript-users-bounces+jmichael=email@hidden>>
on behalf of Shane Stanley
<email@hidden<mailto:email@hidden>>
Date: Sunday, August 13, 2017 at 7:12 PM
To: "ASUL (AppleScript)"
<email@hidden<mailto:email@hidden>>
Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
FWIW, I did some tests closer to Jim's request: 30,000 entries, one match. Time
taken was around 0.1 seconds -- and about two-thirds of that was taken up by
the initial creation of an array from the list.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theBigList to current application's NSMutableArray's arrayWithCapacity:30000
repeat 15000 times
set anEntry to current application's NSUUID's UUID()'s UUIDString()
theBigList's addObject:anEntry
end repeat
repeat 15000 times
set anotherEntry to current application's NSUUID's UUID()'s UUIDString()
theBigList's addObject:anotherEntry
end repeat
theBigList's addObject:anEntry
set theBigList to theBigList as list
set time1 to current application's NSDate's |date|()
--set theBigList to {"a", "b", "c", "d", "e", "f", "g", "h", "a", "i", "e", "e"}
set theBigList to current application's NSArray's arrayWithArray:theBigList
set theCount to theBigList's |count|()
-- get a counted set of the duplicate instances of any duplicated values
set countedDupes to current application's NSCountedSet's setWithArray:theBigList
countedDupes's minusSet:(current application's NSSet's setWithSet:countedDupes)
-- get the indices of the duplicated values' first and dupe instances
set duplicatedValues to countedDupes's allObjects()
set indexInfo to {}
repeat with thisValue in duplicatedValues
-- Value and first index.
set thisIndex to (theBigList's indexOfObject:(thisValue)) + 1
set thisInfo to {thisValue as text, thisIndex}
-- Indices of dupes.
repeat (countedDupes's countForObject:(thisValue)) times
set thisIndex to (theBigList's indexOfObject:(thisValue) inRange:({thisIndex,
theCount - thisIndex})) + 1
set end of thisInfo to thisIndex
end repeat
set end of indexInfo to thisInfo
end repeat
set time2 to time1's timeIntervalSinceNow()
return {indexInfo, -time2}
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden