Re: Best practices for creating and comparing lists of text?
Re: Best practices for creating and comparing lists of text?
- Subject: Re: Best practices for creating and comparing lists of text?
- From: kai <email@hidden>
- Date: Wed, 21 Dec 2005 23:10:26 +0000
On 17 Dec 2005, at 00:00, CYB wrote:
Hi all, this is a question for Kai who answered with a very
interesting solution.
I follow this discussion, and many others, with interest and as a
scripter beginner whit the hope to learn “best practices” in
general, and let me tell you Kai ,I’m very impressed whit your
code, I’m sure you have a lot of experience in this matter. Within
a months reading this list I start to recognize some of you guys,
who always write something that is interesting and I give you
thanks to all for this lessons.
But I also admit that in this particular solution that you Kai,
proposed I don’t understand parts of it, I spend part of the
morning trying to analyze and understand it, but I’m still confuse,
so as a newbie, I would like to ask you if you can explain or make
more explicit parts of you code.
Sorry for the delayed response Carlos (unavoidable commitments
elsewhere, I'm afraid) - although I see that you've already had some
helpful responses. My apologies, too, for any confusion that my script
(s) may have caused.
Your point about newcomers to AppleScript in general, and to this
list in particular, is well taken. After all, one of the benefits of
AS is its more human-readable syntax - which can be reduced if its
syntactical flexibility is pushed to extremes. I've no doubt that,
for some, oblique forms of syntax might even begin to resemble the
near-hieroglyphics of certain other languages - which tends to defeat
the object here (or at least one significant aspect of it).
So - yes, while not particularly wishing to spark off further debate
on the subject, I'd be happy to try and explain my code, to
supplement what's already been said.
I should add that my initial suggestions were quick examples, knocked
up in a few minutes - since it looked like the OP's post was on the
verge of sinking without trace. It's good to see that those initial
attempts seem to have attracted some additional interest (perhaps
even more than the original question). ;-)
Given a little more information and time, refining an existing script
can be a bit easier than starting from scratch. I'd agree, for
example, that a line-by-line analysis would generally be preferable
to the tid-based experiment that I tried for the first script.
There's also the question of which checks to include within the line-
checking repeat loop. Given a high incidence of header lines (those
starting with a "#" character), checking for them within the loop
makes a lot of sense. With a low incidence (in a long list), a
separate extraction method might be worth considering. However, since
the OP has kindly confirmed that the original script worked 'out of
the box', only an initial test seems necessary in this particular case.
So my own reworking of that original might (without any excessive
heroics) go something like this:
---------------------
to parse_lists from t
if (count t) is 0 then return {{}, {}}
set c to count t's paragraphs
if t starts with "#" then
if c is 1 then return {{}, {}}
set c to c - 1
set p to rest of t's paragraphs
else
set p to t's paragraphs
end if
script o
property m : p
property l : p's items
end script
set tid to text item delimiters
set text item delimiters to "="
considering case
repeat with n from 1 to c
tell item n of o's m to if "=" is in it then
set item n of o's m to text item 1
set item n of o's l to text item 2
end if
end repeat
end considering
set text item delimiters to tid
o's {m, l}
end parse_lists
---------------------
Like the original, this doesn't check for empty paragraphs (although
these could be filtered out ahead of the repeat loop using, say, a
tid-based method). Nevertheless, OMM, it offers significant speed
improvements over the vanilla approaches discussed so far (between
2.5 to 13 times faster - depending on the structure of the incoming
list and which script version is used for comparison).
tell t starts with "#" -- Which is the objective to do this. This
is the first time I see something like this.
I understand “starts” as a containment operator (boolean) , as part
of some question, but here forming a tell block, (?) I didn't catch it
if (count t) is 0 or it and (count t's paragraphs) is 1
then return {{}, {}} -- here I’m totally lost, when you say “or it
and ... “ I’m not sure what represent “it” (is, in this case “t”?)
and how it’s working between “or” & “and’
if it then set t to t's text from paragraph 2 to end — Here
I assuming that you are asking if “it” is true?
end tell
On entering the handler, there were a few checks I felt worth making
before getting into the main body of the routine. These included
whether or not the incoming string was empty, whether the header line
(starting with "#") existed - and if it was the only line in the
entire string. A tell block (albeit a somewhat quirky one) helped to
avoid repeating evaluations and to reduce the code.
To understand better how a tell "something-other-than-an-application"
statement works, perhaps I could use the example of a date formatting
routine. This might be expressed initially as:
------------------
set currentWeekday to weekday of (current date)
set currentDay to day of (current date)
set currentMonth to month of (current date)
set currentYear to year of (current date)
set dateString to (currentWeekday as string) & ", " & currentMonth &
" " & currentDay & ", " & currentYear
------------------
The main problem with this is that it makes 4 separate calls to the
Standard Additions scripting addition, where only 1 is really
required. One way around this is to assign the value of the date
returned to a user-defined variable:
------------------
set currentDate to current date
set currentWeekday to weekday of currentDate
set currentDay to day of currentDate
set currentMonth to month of currentDate
set currentYear to year of currentDate
set dateString to (currentWeekday as string) & ", " & currentMonth &
" " & currentDay & ", " & currentYear
------------------
This is more efficient, since it involves only one external call -
and any further evaluations can use the date value held in the
identifier 'currentDate'.
A variation of this approach involves using a tell block, the target
of which is the date value returned by the 'current date' command:
------------------
tell (current date)
set currentWeekday to its weekday
set currentDay to day
set currentMonth to its month
set currentYear to year
set dateString to (currentWeekday as string) & ", " & currentMonth &
" " & currentDay & ", " & currentYear
end tell
------------------
Note that, within the tell block, the predefined variable
'it' (rather than a user-defined variable) is used to refer to the
required value. In some cases, the use of 'it' must be explicit -
while in others, it may be implicit (largely depending on whether or
not the use of a particular keyword might be ambiguous). The above
snippet contains examples of both.
For brevity, and to avoid the setting of additional user-defined
variables, this could be written as a single line:
------------------
tell (current date) to set dateString to (its weekday as string) & ",
" & its month & " " & day & ", " & year
------------------
So the <t starts with "#"> statement that I used is still very much a
boolean containment operator. The code might be easier to follow
using a descriptive user-defined variable - rather than the
predefined variable 'it' in a tell block:
------------------
set textStartsWithFlag to t starts with "#"
if (count t) is 0 or (textStartsWithFlag and (count t's paragraphs)
is 1) then return {{}, {}}
if textStartsWithFlag then set t to t's text from paragraph 2 to end
------------------
The point has already been made, but the time saved in using such
techniques can be small - and may be worth serious consideration only
in repeat-intensive situations. It's also worth bearing in mind that,
within a <tell some_value> block, certain features and functions
(particularly those normally available from scripting additions) may
be inaccessible without some rephrasing.
For further information about using it, me, and my in tell
statements, you might like to check out:
http://developer.apple.com/documentation/AppleScript/Conceptual/
AppleScriptLangGuide/AppleScript.b2.html
script o
property m : t's paragraphs
property l : m's items
end script
Here, why you declare properties inside of a script object? Is not
the same to do it out of it? I had never use script objects and I’m
trying to learn about it.
This could be considered a rather incidental use of a script object -
and may be best explained by first looking at the "A Reference To
Operator" section described in the Applescript Language Guide (about
halfway down the following page, under the heading "NOTES"):
http://developer.apple.com/documentation/AppleScript/Conceptual/
AppleScriptLangGuide/AppleScript.99.html
One purpose of "a reference to" in AppleScript is to defer evaluation
of the expression until the contents of the reference are required -
so the original expression is re-evaluated each time the contents are
requested.
The article goes on to explain that the speed of access to items in a
particularly long list can be substantially improved by using a
reference to that list - rather than by referring directly to the
list itself. The precise reasons for the performance characteristics
of references in this context are not generally known. It may have
something to do with the short-circuiting of certain checks that
AppleScript normally makes for circular references - and possibly
with the way in which a list is accessed internally. In any event,
since the real issue is the disproportionately slower (direct) access
to longer lists, referencing in this context could be considered
rather more corrective than accelerative.
It was evidently Serge Belleudy-d'Espinose who discovered that using
a script object's properties to reference a list was not only more
efficient than direct access, but also faster than using "a reference
to". (Global variables or script properties could also be used for
similar referencing, e.g: "item n of my scriptProperty".)
To give you some idea of the performance differences between these
techniques, here are the results of some comparisons (carried out on
one of my machines) to the get the contents of every item in a given
list. These clearly show the non-linear nature of direct access:
time index: fastest = 1
-----------------------
1000 item list --> {direct:52, |reference|:7, |script object|:1}
2000 item list --> {direct:204, |reference|:14, |script object|:2}
3000 item list --> {direct:462, |reference|:21, |script object|:3}
4000 item list --> {direct:811, |reference|:28, |script object|:4}
I'm sure you get the picture...
Note that, for shorter lists (containing, say, a dozen or so items),
referencing techniques are really not worth considering. At this
level, the times involved are relatively insignificant - and the time
taken to set up a script object can offset any potential gain.
--------------SECOND PART OF YOUR SCRIPT------------
tell m1's text items to if (count) is 1 then
set m2's end to i's contents
else
set AppleScript's text item delimiters to return
set m1 to beginning & ({""} & rest)
end if
Here you began with a tell, but there’s no end tell and..., well I
don’t understand it :(
The absence of an "end tell" may not look quite right at first
glance. But let's take a look at a few simple examples that might
help to demonstrate what's actually going on here.
As you may already know, most control statements are compound
statements (blocks) that contain other statements - possibly
including further, nested control statements. While the last line of
a compound statement always starts with the word "end", some control
statements (notably simple 'tell' and 'if' blocks) can optionally be
written as a single statement - obviating the need for a balancing
"end" line:
------------------
tell application "Finder"
beep
end tell
-- or:
tell application "Finder" to beep
(* no 'end tell' *)
------------------
if true then
beep
end if
-- or:
if true then beep
(* no 'end if' *)
------------------
This syntactic flexibility also allows the conjoining of opening
lines from successive control statements. As statements are
"combined", the number of balancing "end" lines is reduced accordingly:
------------------
tell application "Finder"
if true then
beep
end if
end tell
-- or:
tell application "Finder"
if true then beep
(* no 'end if' *)
end tell
-- or:
tell application "Finder" to if true then
beep
end if
(* no 'end tell' *)
-- or even:
tell application "Finder" to if true then beep
(* no 'end if' or 'end tell' *)
------------------
Other forms of compound statement may be "appended" in a similar way:
------------------
tell application "Finder" to with timeout of 2 seconds
end timeout
(* no 'end tell' *)
------------------
if true then considering case
end considering
(* no 'end if' *)
------------------
tell application "Finder" to if true then repeat 3 times
end repeat
(* no 'end tell' or 'end if' *)
------------------
tell application "System Events" to if true then tell application
"Finder" to ignoring application responses
end ignoring
(* no 'end tell', 'end if' or secondary 'end tell' *)
------------------
This type of syntax (especially extreme forms of it - such as in the
latter example) may begin to obscure the general sense of the code.
If this is the case, then explicit nesting can be restored. That last
snippet could therefore be rewritten as:
------------------
tell application "System Events"
if true then
tell application "Finder"
ignoring application responses
-- do something
end ignoring
end tell
end if
end tell
------------------
Obviously, this is just an example. Perhaps I should add that nested
application tell blocks are generally considered best avoided - so
maybe that should be:
------------------
tell application "System Events"
set some_condition to true
end tell
if some_condition then
tell application "Finder"
ignoring application responses
-- do something
end ignoring
end tell
end if
------------------
I’m sure there’s a more verbose way to do the same and would be
more clear to me, but I didn’t found it. Can you do it for me? and
probably for other beginners that sniff around this list?
Sure thing. The following version of the script includes expanded
variable labels, additional comments and other minor modifications to
(hopefully) clarify the way in which it was intended to work:
------------------
on missing_strings from sourceList1 against sourceList2
(* carry out an initial check, in case one of the lists is empty *)
if (count sourceList1) is 0 or (count sourceList2) is 0 then
-- at least one of the source lists is empty
return {sourceList2, sourceList1}
-- if list A is empty, then every item in list B is missing from it
-- this also means that no items will be missing from list B - so
simply switch the lists around
-- if both lists are empty then no items are missing from either,
so {{}, {}} is returned anyway
end if
(* the following routine adds items to one list and subtracts them
from the other *)
set missingList2 to {}
-- initialise missingList2 as an empty AppleScript list - to which
items may be added
set originalDelimiters to text item delimiters
-- store the current value of AppleScript's text item delimiters
set text item delimiters to return
-- prepare for a list-to-string coercion (below), which will produce
a return-delimited string
(* the following statement uses AppleScript's string concatenation
rules to invoke a list-to-string coercion
for further information, see: http://tinyurl.com/bfu4y *)
set missingList1 to return & sourceList2 & return
-- assign sourceList2's items to missingList1
-- coerce the list to a string
-- insert a return character at the beginning and end of the string
-- missingList1 is now a return-encased, return-delimited text list
- from which items may be subtracted
(* in the following repeat loop, each search item will start and end
with a return character
wrapping the list in return characters ensures that its first and
last items can also be matched correctly *)
repeat with currentItem in sourceList1
set text item delimiters to return & currentItem & return
-- encase the current item in return characters
-- this ensures that only an exact match will be found in missingList1
-- (for example, "file 3" will not now return a match for part of
"file 32")
set currentTextItems to missingList1's text items
-- store the value of missingList1's text items
if (count currentTextItems) is 1 then
-- no match for the current item has been found in missingList1
(formerly sourceList2)
set missingList2's end to currentItem's contents
-- since the current item is not in sourceList2, add it to
missingList2
else
-- a match for the current item has been found - so it is not
missing from sourceList1
set text item delimiters to return
set missingList1 to currentTextItems as string
-- replace what was <return & currentItem & return> in
missingList1 with return
-- (this effectively deletes currentItem from missingList1)
end if
end repeat
set text item delimiters to originalDelimiters
-- restore AppleScript's text item delimiters to their initial value
when the handler was entered
(* at this point, missingList1 is still a return-encased, return-
delimited text list
missingList2 is still a regular AppleScript list *)
if missingList1 is return then return {{}, missingList2}
-- missingList1 contains no listed items - so it should correctly be
returned as an empty list
-- otherwise, the statement below would return it as a list of two
empty strings: {"", ""}
set missingList1 to missingList1's paragraphs 2 thru -2
-- convert missingList1 from a text list to an AppleScript list
-- in addition, remove its empty outer strings
return {missingList1, missingList2}
end missing_strings
set Modulenames to {"name1", "name1name2", "name2", "name2name3",
"name3", "name4", "name6"}
set Foldernames to {"name1", "name1name2", "name3", "name4",
"name4name5", "name5", "name6"}
missing_strings from Modulenames against Foldernames returning
{missingModules, missingFolders}
--> {{"name4name5", "name5"}, {"name2", "name2name3"}}
------------------
I apologise for the length of this reply, which I hope helps to
answer at least some of your questions.
:-)
---
kai
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden