• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: What is Best Method To Determine Duplicate Items in a Large List?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What is Best Method To Determine Duplicate Items in a Large List?


  • Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • From: Jim Underwood <email@hidden>
  • Date: Mon, 14 Aug 2017 02:30:00 +0000
  • Thread-topic: What is Best Method To Determine Duplicate Items in a Large List?

Re: What is Best Method To Determine Duplicate Items in a Large
List?<http://lists.apple.com/archives/applescript-users/2017/Aug/msg00053.html>

eMail Sent: 2017-08-13  07:12 PM CDT


Shane!  You are THE man!


My test on real-world data of 18,000 items (each having up to 100 char)
produced 398 sets of dup Note Titles in only 0.3 sec!!!!  The indexes in each
set provided a direct link to the actual Note in Evernote!  Perfect!!


And, it found only exact matches.  There were other items (notes) that contains
the same item text, but they were correctly ignored.


An example:


[cid:1873A51E-1954-43C1-8A83-9A5E4EC0F89C]


Result of Search in Evernote app:

So while 4 Note Titles contained "Sync Time", ONLY 2 had exactly that for the
Title.

And Shane's script correctly identified those two.


[cid:6378EC4B-6460-4390-83A4-410DE6B6D816]


Script confirmed!


When I get the finished script working, I will publish.


Many, many thanks again, Shane!

Best Regards,

Jim Underwood
aka JMichaelTX


From: AppleScript-Users
<applescript-users-bounces+jmichael=email@hidden<mailto:applescript-users-bounces+jmichael=email@hidden>>
 on behalf of Shane Stanley
<email@hidden<mailto:email@hidden>>
Date: Sunday, August 13, 2017 at 7:12 PM
To: "ASUL (AppleScript)"
<email@hidden<mailto:email@hidden>>
Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?

FWIW, I did some tests closer to Jim's request: 30,000 entries, one match. Time
taken was around 0.1 seconds -- and about two-thirds of that was taken up by
the initial creation of an array from the list.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theBigList to current application's NSMutableArray's arrayWithCapacity:30000
repeat 15000 times
set anEntry to current application's NSUUID's UUID()'s UUIDString()
theBigList's addObject:anEntry
end repeat
repeat 15000 times
set anotherEntry to current application's NSUUID's UUID()'s UUIDString()
theBigList's addObject:anotherEntry
end repeat
theBigList's addObject:anEntry
set theBigList to theBigList as list

set time1 to current application's NSDate's |date|()

--set theBigList to {"a", "b", "c", "d", "e", "f", "g", "h", "a", "i", "e", "e"}
set theBigList to current application's NSArray's arrayWithArray:theBigList
set theCount to theBigList's |count|()
-- get a counted set of the duplicate instances of any duplicated values
set countedDupes to current application's NSCountedSet's setWithArray:theBigList
countedDupes's minusSet:(current application's NSSet's setWithSet:countedDupes)
-- get the indices of the duplicated values' first and dupe instances
set duplicatedValues to countedDupes's allObjects()
set indexInfo to {}
repeat with thisValue in duplicatedValues
-- Value and first index.
set thisIndex to (theBigList's indexOfObject:(thisValue)) + 1
set thisInfo to {thisValue as text, thisIndex}
-- Indices of dupes.
repeat (countedDupes's countForObject:(thisValue)) times
set thisIndex to (theBigList's indexOfObject:(thisValue) inRange:({thisIndex,
theCount - thisIndex})) + 1
set end of thisInfo to thisIndex
end repeat
set end of indexInfo to thisInfo
end repeat
set time2 to time1's timeIntervalSinceNow()

return {indexInfo, -time2}

PNG image

PNG image

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: What is Best Method To Determine Duplicate Items in a Large List?
      • From: Jim Underwood <email@hidden>
References: 
 >Re: What is Best Method To Determine Duplicate Items in a Large List? (From: "Nigel Garvey" <email@hidden>)
 >Re: What is Best Method To Determine Duplicate Items in a Large List? (From: Shane Stanley <email@hidden>)

  • Prev by Date: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Next by Date: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Previous by thread: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Next by thread: Re: What is Best Method To Determine Duplicate Items in a Large List?
  • Index(es):
    • Date
    • Thread