Re: Need a faster Find Duplicates Routine
Re: Need a faster Find Duplicates Routine
- Subject: Re: Need a faster Find Duplicates Routine
- From: Christopher Nebel <email@hidden>
- Date: Wed, 28 May 2003 14:58:48 -0700
On Wednesday, May 28, 2003, at 12:48 AM, Johnny AppleScript wrote:
What I'm after is finding a list of every instance of a duplicate file
from a list, and a reference to
that file from a source list; in this case, in iTunes, where the 'Get
Every' feature is *still* broken after two major upgrades.
I.E.,
IF:
set sourceList to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
"j",
"j", "k", "l", "m", "n", "o", "o", "o", "p", "q", "q", "r", "r", "s",
"s"}
[script actions]
THEN:
dupList:{"j", j", "j", "o", "o", "o", "q", "q", "r", "r", "s", "s"}
OK, that's clearer than your original handler, which has a bug in it:
if there are n copies of an item with a particular name, you'll get
n*(n-1) instances in dupesList. Depending on what you do with the
items afterward, that may not be an issue.
I'm not sure why you claim that "get every" is broken -- it works just
fine for me. Now, it's true that if you have a list of items, you
can't say "get name of every item of theList", but that's a problem
with AppleScript, not iTunes. If you can assume that you're looking
for duplicates in a particular playlist, you can improve the speed
quite a bit:
-- Tested with G4 667 MHz; 686 song library
-- original: ~2 minutes.
set dupesList to {}
set checkList to {}
tell application "iTunes"
set p to library playlist 1
repeat with i in every track of p
set x to the name of i
if x is in checkList then
get (every track of p whose name is x)
set dupesList to dupesList & the result
else
copy x to the end of checkList
end if
end repeat
end tell
-- ~1 minute.
Replacing the internal loop with a "whose" expression cuts the time in
half on my system. Of course, this still has the n*(n-1) bug, because
we collect the duplicates every time we see one, i.e., n-1 times for n
copies.
An even more effective improvement is to cut down on the event traffic
by getting all the track names from iTunes at once, unique-ing them in
AppleScript, and then getting the tracks whose names match the known
duplicates:
-- get all the names first.
tell application "iTunes"
set p to library playlist 1
set theItems to the name of every track of p
end tell
-- get the duplicated ones.
set dupesList to {}
set checkList to {}
repeat with i in theItems
set x to contents of i
if x is in checkList then
if x is not in dupesList then
set end of dupesList to x
end if
else
set the end of checkList to x
end if
end repeat
-- map the names back to tracks.
set dupTracks to {}
tell application "iTunes"
repeat with i in dupesList
set dupTracks to dupTracks & (every track of p whose name is i)
end repeat
end tell
-- running time: 16 seconds.
We got another factor of four. (Also notice the "x is not in
dupesList" test -- this ensures that our list of duplicate names does
not itself contain any duplicates, so we get each track exactly once.)
We can improve this still further by exploiting a faster unique tester,
namely the "uniq" shell command. Because of input limits on shell
parameters, we need to get the track names from the shell too, or it
would break for libraries of more than a few thousand songs:
-- get all the duplicated names.
set iTunesCommand to "set the text item delimiters to ASCII character
10
tell application \"iTunes\" to get the name of every track of library
playlist 1
return the result as string"
do shell script "osascript -e " & quoted form of iTunesCommand & " |
sort | uniq -d"
set dupesList to every paragraph of the result
-- map the names back to tracks.
set dupTracks to {}
tell application "iTunes"
repeat with i in dupesList
set dupTracks to dupTracks & (every track of library playlist 1
whose name is i)
end repeat
end tell
-- running time: 12 seconds.
Because of how uniq is defined, we have to sort the names first, but
that's pretty fast. Interestingly, most of the time is spent mapping
the names back to tracks -- building dupesList only takes about 3
seconds.
--Chris Nebel
Apple Development Tools
On Tuesday, May 27, 2003, at 11:58 AM, Johnny AppleScript wrote:
Anyone know of any samples that will run faster than this? This one is
fine
for under 300 items, but give it 3000 (3k X 3k = painfully
inefficient), and
be prepared to wait... and wait... and wait...
set theItems to [a list of items]
set dupesList to {}
set checkList to {}
repeat with i from 1 to number of items in theItems
set x to the name of item i of theItems
if x is in checkList then
repeat with i from 1 to number of items in theItems
set y to item i of theItems
if the name of y is x then
set the end of dupesList to y
end if
end repeat
else
copy x to the end of checkList
end if
end repeat
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.