Re: getting duplicates in a list by item id
Re: getting duplicates in a list by item id
- Subject: Re: getting duplicates in a list by item id
- From: "Nigel Garvey" <email@hidden>
- Date: Sun, 26 Aug 2007 10:37:53 +0100
"Patrik B." wrote on Fri, 24 Aug 2007 11:48:19 -0700:
>I have a list over over 10,000 posix paths. A lot of the paths are
>identical
>so what I want to do is compare it to itself and consolidate it.
>All posix paths are read by my script into applescript - this is easy so
>far
>and I can also easily cycle through them.
>
>Now what I need help with is how I get a new list that has each path only
>occuring only once and another variable that gives me the line number i.e.
>1,3 for each separated by a comma ideally.
>
>So that I end up with two variables (one containing all paths only occuring
>once and the other containing all the line numbers.
It can obviously be quite labour-intensive checking which paths have
already been done and matching the line numbers to the correct items.
I've tried using text item delimiters and counting the paragraphs of each
text item, but that makes Script Editor keep quitting suddenly on the
Jaguar machine I'm using at the moment.
Another vanilla approach is to sort the list of paths, sorting a list of
the line numbers in parallel. This groups identical paths (and their line
numbers) together, which makes thinning out and line matching much
faster. If a merge sort is used, the multiple line numbers for any path
will be in the correct order, so we can later sort again on the first
line number for each path to return the thinned-out paths to the order
they first appear in the file. It still takes a while, but it seems to be
much faster than the linear approach.
-- Assuming a Unicode text file containing only paths.
set allPaths to paragraphs of (read file ((path to desktop as Unicode
text) & "Test file.txt") as Unicode text)
fred(allPaths)
on fred(allPaths) -- For want of a better handler name.
-- This script object customises CustomMergeSort() to sort
-- a list normally, echoing the moves in another list.
script sortInParallel
property slaveList : {}
property slaveListExtract : {}
on isLess(a, b)
(a < b)
end isLess
on getExtract(a, b)
set slaveListExtract to items a thru b of my slaveList
end getExtract
on mergeMove(a, b)
set item b of my slaveList to item a of my slaveListExtract
end mergeMove
on swap(a, b)
tell item a of my slaveList
set item a of my slaveList to item b of my slaveList
set item b of my slaveList to it
end tell
end swap
end script
-- This script object customises CustomMergeSort() to sort
-- a list of lists by the first item in each list, echoing
-- the moves in another list.
script sortListsByItem1
property parent : sortInParallel
on isLess(a, b)
(beginning of a < beginning of b)
end isLess
end script
-- Script object for fast list accesses.
script o
property allPths : allPaths
property paths : {}
property lineLists : {}
property thisLineList : {}
end script
-- Prepare a list of line numbers to be
-- sorted in parallel with the paths.
repeat with i from 1 to (count allPaths)
set end of sortInParallel's slaveList to i
end repeat
-- Sort the paths with the line numbers in train.
considering case
CustomMergeSort(allPaths, 1, -1, sortInParallel)
-- Now that duplicated paths are grouped, weed out
-- the duplicates and get a list of the matching line
-- number(s) for each path.
set thisPath to beginning of o's allPths
set o's paths to {thisPath}
set o's thisLineList to {beginning of sortInParallel's slaveList}
repeat with i from 2 to (count allPaths)
if (item i of o's allPths is thisPath) then
set end of o's thisLineList to item i of sortInParallel's slaveList
else
set end of o's lineLists to o's thisLineList
set o's thisLineList to {item i of sortInParallel's slaveList}
set thisPath to item i of o's allPths
set end of o's paths to thisPath
end if
end repeat
end considering
-- Sort the list of line-number lists by the first number
-- in each list, echoing the moves in the pruned path list.
set sortListsByItem1's slaveList to o's paths
CustomMergeSort(o's lineLists, 1, -1, sortListsByItem1)
-- Coerce the line-number lists to the required strings.
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ", "
repeat with i from 1 to (count o's lineLists)
set item i of o's lineLists to (item i of o's lineLists) as string
end repeat
set AppleScript's text item delimiters to astid
return {o's paths, o's lineLists}
end fred
on CustomMergeSort(theList, l, r, compObj) -- Sort items l thru r of
theList.
script o
property lst : theList
property extract : missing value
on msort(l, r)
set m1 to (l + r) div 2
set m2 to m1 + 1
sortHalf(l, m1) -- Subsort "left" half.
sortHalf(m2, r) -- Subsort "right" half.
set extract to items l thru r of my lst
compObj's getExtract(l, r)
set leftIdx to 1
set endOfLeft to m2 - l
set rightIdx to endOfLeft + 1
set endOfRight to r - l + 1
set leftVal to beginning of my extract
set rightVal to item rightIdx of my extract
repeat with i from l to r
if (compObj's isLess(rightVal, leftVal)) then
set item i of my lst to rightVal
compObj's mergeMove(rightIdx, i)
if (rightIdx < endOfRight) then
set rightIdx to rightIdx + 1
set rightVal to item rightIdx of my extract
else
repeat with i from (i + 1) to r
set item i of my lst to item leftIdx of my extract
compObj's mergeMove(leftIdx, i)
set leftIdx to leftIdx + 1
end repeat
exit repeat
end if
else
set item i of my lst to leftVal
compObj's mergeMove(leftIdx, i)
if (leftIdx < endOfLeft) then
set leftIdx to leftIdx + 1
set leftVal to item leftIdx of my extract
else
exit repeat
end if
end if
end repeat
end msort
on sortHalf(l, r)
if (r - l > 1) then
msort(l, r)
else if (r > l) then
set leftVal to item l of my lst
set rightVal to item r of my lst
if (compObj's isLess(rightVal, leftVal)) then
set item l of my lst to rightVal
set item r of my lst to leftVal
compObj's swap(l, r)
end if
end if
end sortHalf
end script
set listLen to (count theList)
if (listLen > 1) then
if (l < 0) then set l to listLen + l + 1
if (r < 0) then set r to listLen + r + 1
if (r = l) then
else
if (l > r) then
set temp to l
set l to r
set r to temp
end if
o's msort(l, r)
end if
end if
return -- nothing
end CustomMergeSort
I've seen a lot of talk in this thread about associative arrays, Ruby,
and Perl, and how mind-bogglingly fast they are, but no actual code. Did
that particualr post not make it to the digest?
NG
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden