Re: Index of Handler
Re: Index of Handler
- Subject: Re: Index of Handler
- From: email@hidden
- Date: Thu, 12 Jul 2001 11:21:56 -0400
On Wed, 11 Jul 2001 10:25:59 -0700, List Guy <email@hidden> asked,
>
I'd appreciate someone telling me if my supposition about
>
>
on IndexOf(lst, itm)
>
set x to 1
>
tell no & lst to if {itm} is in it then repeat until item x = itm
>
set x to x + 1
>
end repeat
>
x - 1
>
end IndexOf
>
>
is correct:
>
>
"no", the applescript constant for use with the close command, is used here
>
to substitute for a longer block of code for when itm isn't in lst. I tried
>
"yes", "all caps" and other constants and all worked. That is, "no" is used
>
because the indexOf handler assumes that itm will never be the applescript
>
constant "no".
This works, and your reasoning is correct as to why it works. However, you are
making things more complicated than they need to be. I'd prefer the much more
straightforward approach.
on IndexOf(lst, itm)
if itm is not in lst then return 0
set x to 1
repeat until item x of lst = itm
set x to x + 1
end repeat
return x
end IndexOf
This way its clearer that 0 is returned if the item isn't found, and you can
easily change what you return if the item isn't in the list.
Your original approach is similar to the use of a "sentinel", which is a way to
speed up the comparisons that are part of a search through an array. In this
particular application, you can stick what you are searching for at the end of
the list, and then not have to test for whether your index has gone past the end
of the array. You could write your handler like this:
on IndexOf(lst, itm)
set end of lst to itm
set x to 1
repeat
if item x of lst is itm then return x
set x to x + 1
end repeat
end IndexOf
By having the sentinel in place, you don't need to first check, "itm is in lst"
Now, this approach has the drawback of returning "N+1" as the "not found" value
when searching a list of length N. But it may be faster, since it doesn't have
to test whether itm is in lst initially. "itm is not in lst" will take time
proportional to the length of the list, so it may take a lot of time on a long
list. But "set the end of lst to itm" may also have to walk the length of the
list, so the time tradeoff may be a wash. So formally, the first handler takes
a*N+b*N time, and the second takes c*N+d*N time, and a,b,c, and d may differ by
a factor 100. (Generally, a single statement in AppleScript is faster than a
repeat loop doing the same work, unless the single statement is a call to a
Scripting Addition.)
If I were writing this handler, I'd try the first one, and only if it then
proves to be not fast enough, try the second. Also, if speed is an issue, you
should pay attention to the particular setup of the data. You may be able to
guarantee that the item is always in the list, and avoid the "item not found"
case entirely. Or the item may only rarely not be in the list, and you could
trap that as an error condition:
on IndexOf(lst, itm)
set x to 1
try
repeat
if item x of lst is itm then return x
set x to x + 1
end repeat
on error -1728 -- "Can't get item x of lst"
return missing value
end try
end IndexOf
This method actually may be the fastest of all, since AppleScript is already
doing the work of checking that the index is in-bounds within the list.
Invoking the error may take a lot of time, though, so you need to consider
whether the item will be missing from the list frequently, or infrequently.
Another alternative: do you really want the index of the item? If you are going
to refer to it, you might do better with a reference. Then, you could use,
on IndexOf(lst, itm)
repeat with itemref in lst
if contents of itemref is itm then return itemref
end repeat
return missing value
end IndexOf
Instead of returning, say, "42", this version returns the reference, "item 42 of
yourList", which you can use directly in place of "item i of yourList" in your
script.
Exactly what you return in the case of the item not being found might be tricky,
though. It would depend on your code. It certainly no harder to handle than if
you return an index number, with 0 meaning "not found":
set i to IndexOf(myList, "foo")
if i is 0 then
--handle not-found situation
else
... item i of myList ...
as compared to
set i to IndexOf(myList, "foo")
if i is missing value then
-- handle not found situation
else
... contents of i ...
>
I suppose that
>
>
on IndexOf(lst, itm)
>
set x to 1
>
tell missing value & lst
>
if {itm} is in it then
>
repeat until item x = itm
>
set x to x + 1
>
end repeat
>
end if
>
end tell
>
return x - 1
>
end IndexOf
>
>
might be clearer because it uses the "missing value" constant, except that
>
soemone might query the list to see if it contains a missing value.
This is the problem with using a sentinel. You must distinguish between the
sentinel as data and the sentinel as an out-of-band marker. "C" has this
problem with the null character it uses to mark the end of strings--you can't
then have a null as part of a string. Every language has the problem of the
quote character as the sentinel for the end of a string constant. Using the
RegExp Scripting Addition, you need to have a "line end" character, so if you
want to search across line boundaries, you pick a character that's not present
in the text. If you can....
Often, some unused value can be a handy sentinel, like "0" for an item index, or
"January 1, 1904" as the beginning of time. If you are writing the handler for
a situation where you know the details of the data to be processed, you can
often find a good sentinel. But if you are writing a library component, or
worse, dealing with user input, you can never be sure. (Those darn users,
always messing things up!) Worst of all is if you are writing for the Web,
where malicious users will send exactly the wrong data to mess up your sentinel.
--
Scott Norton Phone: +1-703-299-1656
DTI Associates, Inc. Fax: +1-703-706-0476
2920 South Glebe Road Internet: email@hidden
Arlington, VA 22206-2768 or email@hidden