Re: What is Best Method To Determine Duplicate Items in a Large List?
Re: What is Best Method To Determine Duplicate Items in a Large List?
- Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
- From: Jim Underwood <email@hidden>
- Date: Mon, 14 Aug 2017 02:30:00 +0000
- Thread-topic: What is Best Method To Determine Duplicate Items in a Large List?
Re: What is Best Method To Determine Duplicate Items in a Large
List?<http://lists.apple.com/archives/applescript-users/2017/Aug/msg00053.html>
eMail Sent: 2017-08-13 07:12 PM CDT
Shane! You are THE man!
My test on real-world data of 18,000 items (each having up to 100 char)
produced 398 sets of dup Note Titles in only 0.3 sec!!!! The indexes in each
set provided a direct link to the actual Note in Evernote! Perfect!!
And, it found only exact matches. There were other items (notes) that contains
the same item text, but they were correctly ignored.
An example:
[cid:1873A51E-1954-43C1-8A83-9A5E4EC0F89C]
Result of Search in Evernote app:
So while 4 Note Titles contained "Sync Time", ONLY 2 had exactly that for the
Title.
And Shane's script correctly identified those two.
[cid:6378EC4B-6460-4390-83A4-410DE6B6D816]
Script confirmed!
When I get the finished script working, I will publish.
Many, many thanks again, Shane!
Best Regards,
Jim Underwood
aka JMichaelTX
From: AppleScript-Users
<applescript-users-bounces+jmichael=email@hidden<mailto:applescript-users-bounces+jmichael=email@hidden>>
on behalf of Shane Stanley
<email@hidden<mailto:email@hidden>>
Date: Sunday, August 13, 2017 at 7:12 PM
To: "ASUL (AppleScript)"
<email@hidden<mailto:email@hidden>>
Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
FWIW, I did some tests closer to Jim's request: 30,000 entries, one match. Time
taken was around 0.1 seconds -- and about two-thirds of that was taken up by
the initial creation of an array from the list.
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
set theBigList to current application's NSMutableArray's arrayWithCapacity:30000
repeat 15000 times
set anEntry to current application's NSUUID's UUID()'s UUIDString()
theBigList's addObject:anEntry
end repeat
repeat 15000 times
set anotherEntry to current application's NSUUID's UUID()'s UUIDString()
theBigList's addObject:anotherEntry
end repeat
theBigList's addObject:anEntry
set theBigList to theBigList as list
set time1 to current application's NSDate's |date|()
--set theBigList to {"a", "b", "c", "d", "e", "f", "g", "h", "a", "i", "e", "e"}
set theBigList to current application's NSArray's arrayWithArray:theBigList
set theCount to theBigList's |count|()
-- get a counted set of the duplicate instances of any duplicated values
set countedDupes to current application's NSCountedSet's setWithArray:theBigList
countedDupes's minusSet:(current application's NSSet's setWithSet:countedDupes)
-- get the indices of the duplicated values' first and dupe instances
set duplicatedValues to countedDupes's allObjects()
set indexInfo to {}
repeat with thisValue in duplicatedValues
-- Value and first index.
set thisIndex to (theBigList's indexOfObject:(thisValue)) + 1
set thisInfo to {thisValue as text, thisIndex}
-- Indices of dupes.
repeat (countedDupes's countForObject:(thisValue)) times
set thisIndex to (theBigList's indexOfObject:(thisValue) inRange:({thisIndex,
theCount - thisIndex})) + 1
set end of thisInfo to thisIndex
end repeat
set end of indexInfo to thisInfo
end repeat
set time2 to time1's timeIntervalSinceNow()
return {indexInfo, -time2}
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden