Re: Removing (or picking) items from list 1 via list 2?
Re: Removing (or picking) items from list 1 via list 2?
- Subject: Re: Removing (or picking) items from list 1 via list 2?
- From: email@hidden (Michael Sullivan)
- Date: Tue, 23 Apr 2002 14:03:48 -0400
- Organization: Society for the Incurably Pompous
>
Here's the problem, speed freaks.
>
I have a list - list1 - which could get quite large - say, 400-500 items,
>
all OS aliases.
>
Meanwhile I also have list2 which will also have lots of aliases. Might
>
easily be bigger than list1.
>
I want to pick things randomly from list1 that aren't in list2, to create
>
mywanteditem.
>
I had thought of
>
--
>
set found to false
>
repeat until found is true
>
set randomnum to (random number from 1 to count items in list1)
>
if item randomnum of list1 is not in list2 then
>
set mywanteditem to item randomnum of list1
>
set found to true
>
end repeat
>
mywanteditem
>
--
Here are your biggest speed issues:
1. Using a reference (e.g. "my list1") will be faster when accessing
lists in certain cases. It doesn't seem to matter with "is in", but
*does* matter with "count". So, The other possibility to get rid of
count is to do it before the repeat and place it in a variable, then
refer to the variable. That probably will help, even if you are using a
reference. If you're putting this within a handler, then you'll need to
use script object encapsulation.
2. Random number. This is an osax call and therefore quite slow within
repeats. There are two possibilities. You can use the "some item"
feature, to get a random item from list1, or you can download my
pseudoRand applemod, which will generate random integers from 1 to n
about 40 times as fast as the osax call. If your list1 will stay
shorter than 500 items, then some item should be roughly as fast. it's
order n on the length of list, however, and pseudoRand is not, so if
list1 could be very long (more than 2-3,000 items), you should probably
use pseudoRand. BTW, if you use "some item", then using "some item of
my list1" is much faster than "some item of list1"
>
But I wonder if the fastest way eventually might be
>
Is the best way to create list3, by eg
>
--
>
set list3 to {}
>
set list3ref to a reference to list3
>
repeat with anitem in list1
>
if anitem is not in list2 then set end of list3ref to anitem
>
end repeat
>
set randomnum to randomnumber from 1 to count items in list3
>
set mywanteditem to item randomnum of list3
>
--
Whether this will be faster (note, the same optimizations I talked about
above will apply to this as well) is going to depend on your data set.
How likely is a random item from list1 going to be in list2? If almost
always, then this should work better. If less than about 90% of the
time, then your first handler will probably be faster on average. the
exact percentage would need to be sussed out by trial and error.
think about it though -- In the first, you repeat through list1 only
until you find a hit. In the second, you repeat through every single
item. The second is only going to be faster if there are very few
possible hits relative to the size of the list. (Well, if you use
random number, it's a wider gap, because you're calling it less often in
the second algorithm, but some item or pseudoRand cuts that time down to
something like 2-3 regular list accesses instead of 50+).
>
Which is the more efficient way? Or is there an even more efficient method?
>
This is all in the context of a big repeat loop where the contents of list1
>
will keep changing (it'll be created afresh, but get smaller, with each
>
loop).
It might be good to know the context. It's possible there's some major
optimization to be had there as well.
Michael
--
Michael Sullivan
Business Card Express of CT Thermographers to the Trade
Cheshire, CT email@hidden
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.