Re: Best practices for creating and comparing lists of text?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best practices for creating and comparing lists of text?

Subject: Re: Best practices for creating and comparing lists of text?
From: kai <email@hidden>
Date: Wed, 21 Dec 2005 23:10:26 +0000


On 17 Dec 2005, at 00:00, CYB wrote:

Hi all, this is a question for Kai who answered with a very interesting solution.

I follow this discussion, and many others, with interest and as a scripter beginner whit the hope to learn “best practices” in general, and let me tell you Kai ,I’m very impressed whit your code, I’m sure you have a lot of experience in this matter. Within a months reading this list I start to recognize some of you guys, who always write something that is interesting and I give you thanks to all for this lessons.

But I also admit that in this particular solution that you Kai, proposed I don’t understand parts of it, I spend part of the morning trying to analyze and understand it, but I’m still confuse, so as a newbie, I would like to ask you if you can explain or make more explicit parts of you code.

Sorry for the delayed response Carlos (unavoidable commitments elsewhere, I'm afraid) - although I see that you've already had some helpful responses. My apologies, too, for any confusion that my script (s) may have caused.

Your point about newcomers to AppleScript in general, and to this list in particular, is well taken. After all, one of the benefits of AS is its more human-readable syntax - which can be reduced if its syntactical flexibility is pushed to extremes. I've no doubt that, for some, oblique forms of syntax might even begin to resemble the near-hieroglyphics of certain other languages - which tends to defeat the object here (or at least one significant aspect of it).

So - yes, while not particularly wishing to spark off further debate on the subject, I'd be happy to try and explain my code, to supplement what's already been said.

I should add that my initial suggestions were quick examples, knocked up in a few minutes - since it looked like the OP's post was on the verge of sinking without trace. It's good to see that those initial attempts seem to have attracted some additional interest (perhaps even more than the original question). ;-)

Given a little more information and time, refining an existing script can be a bit easier than starting from scratch. I'd agree, for example, that a line-by-line analysis would generally be preferable to the tid-based experiment that I tried for the first script.

There's also the question of which checks to include within the line- checking repeat loop. Given a high incidence of header lines (those starting with a "#" character), checking for them within the loop makes a lot of sense. With a low incidence (in a long list), a separate extraction method might be worth considering. However, since the OP has kindly confirmed that the original script worked 'out of the box', only an initial test seems necessary in this particular case.

So my own reworking of that original might (without any excessive heroics) go something like this:

---------------------

to parse_lists from t
	if (count t) is 0 then return {{}, {}}
	set c to count t's paragraphs
	if t starts with "#" then
		if c is 1 then return {{}, {}}
		set c to c - 1
		set p to rest of t's paragraphs
	else
		set p to t's paragraphs
	end if
	script o
		property m : p
		property l : p's items
	end script
	set tid to text item delimiters
	set text item delimiters to "="
	considering case
		repeat with n from 1 to c
			tell item n of o's m to if "=" is in it then
				set item n of o's m to text item 1
				set item n of o's l to text item 2
			end if
		end repeat
	end considering
	set text item delimiters to tid
	o's {m, l}
end parse_lists

---------------------

Like the original, this doesn't check for empty paragraphs (although these could be filtered out ahead of the repeat loop using, say, a tid-based method). Nevertheless, OMM, it offers significant speed improvements over the vanilla approaches discussed so far (between 2.5 to 13 times faster - depending on the structure of the incoming list and which script version is used for comparison).

tell t starts with "#" -- Which is the objective to do this. This is the first time I see something like this. I understand “starts” as a containment operator (boolean) , as part of some question, but here forming a tell block, (?) I didn't catch it if (count t) is 0 or it and (count t's paragraphs) is 1 then return {{}, {}} -- here I’m totally lost, when you say “or it and ... “ I’m not sure what represent “it” (is, in this case “t”?) and how it’s working between “or” & “and’ if it then set t to t's text from paragraph 2 to end — Here I assuming that you are asking if “it” is true? end tell

On entering the handler, there were a few checks I felt worth making before getting into the main body of the routine. These included whether or not the incoming string was empty, whether the header line (starting with "#") existed - and if it was the only line in the entire string. A tell block (albeit a somewhat quirky one) helped to avoid repeating evaluations and to reduce the code.

To understand better how a tell "something-other-than-an-application" statement works, perhaps I could use the example of a date formatting routine. This might be expressed initially as:

------------------

set currentWeekday to weekday of (current date) set currentDay to day of (current date) set currentMonth to month of (current date) set currentYear to year of (current date) set dateString to (currentWeekday as string) & ", " & currentMonth & " " & currentDay & ", " & currentYear

------------------

The main problem with this is that it makes 4 separate calls to the Standard Additions scripting addition, where only 1 is really required. One way around this is to assign the value of the date returned to a user-defined variable:

------------------

set currentDate to current date set currentWeekday to weekday of currentDate set currentDay to day of currentDate set currentMonth to month of currentDate set currentYear to year of currentDate set dateString to (currentWeekday as string) & ", " & currentMonth & " " & currentDay & ", " & currentYear

------------------

This is more efficient, since it involves only one external call - and any further evaluations can use the date value held in the identifier 'currentDate'.

A variation of this approach involves using a tell block, the target of which is the date value returned by the 'current date' command:

------------------

tell (current date) set currentWeekday to its weekday set currentDay to day set currentMonth to its month set currentYear to year set dateString to (currentWeekday as string) & ", " & currentMonth & " " & currentDay & ", " & currentYear end tell

------------------

Note that, within the tell block, the predefined variable 'it' (rather than a user-defined variable) is used to refer to the required value. In some cases, the use of 'it' must be explicit - while in others, it may be implicit (largely depending on whether or not the use of a particular keyword might be ambiguous). The above snippet contains examples of both.

For brevity, and to avoid the setting of additional user-defined variables, this could be written as a single line:

------------------

tell (current date) to set dateString to (its weekday as string) & ", " & its month & " " & day & ", " & year

------------------

So the <t starts with "#"> statement that I used is still very much a boolean containment operator. The code might be easier to follow using a descriptive user-defined variable - rather than the predefined variable 'it' in a tell block:

------------------

set textStartsWithFlag to t starts with "#" if (count t) is 0 or (textStartsWithFlag and (count t's paragraphs) is 1) then return {{}, {}} if textStartsWithFlag then set t to t's text from paragraph 2 to end

------------------

The point has already been made, but the time saved in using such techniques can be small - and may be worth serious consideration only in repeat-intensive situations. It's also worth bearing in mind that, within a <tell some_value> block, certain features and functions (particularly those normally available from scripting additions) may be inaccessible without some rephrasing.

For further information about using it, me, and my in tell statements, you might like to check out:

http://developer.apple.com/documentation/AppleScript/Conceptual/ AppleScriptLangGuide/AppleScript.b2.html

script o
       property m : t's paragraphs
       property l : m's items
end script
Here, why you declare properties inside of a script object? Is not the same to do it out of it? I had never use script objects and I’m trying to learn about it.

This could be considered a rather incidental use of a script object - and may be best explained by first looking at the "A Reference To Operator" section described in the Applescript Language Guide (about halfway down the following page, under the heading "NOTES"):

http://developer.apple.com/documentation/AppleScript/Conceptual/ AppleScriptLangGuide/AppleScript.99.html

One purpose of "a reference to" in AppleScript is to defer evaluation of the expression until the contents of the reference are required - so the original expression is re-evaluated each time the contents are requested.

The article goes on to explain that the speed of access to items in a particularly long list can be substantially improved by using a reference to that list - rather than by referring directly to the list itself. The precise reasons for the performance characteristics of references in this context are not generally known. It may have something to do with the short-circuiting of certain checks that AppleScript normally makes for circular references - and possibly with the way in which a list is accessed internally. In any event, since the real issue is the disproportionately slower (direct) access to longer lists, referencing in this context could be considered rather more corrective than accelerative.

It was evidently Serge Belleudy-d'Espinose who discovered that using a script object's properties to reference a list was not only more efficient than direct access, but also faster than using "a reference to". (Global variables or script properties could also be used for similar referencing, e.g: "item n of my scriptProperty".)

To give you some idea of the performance differences between these techniques, here are the results of some comparisons (carried out on one of my machines) to the get the contents of every item in a given list. These clearly show the non-linear nature of direct access:

time index: fastest = 1
-----------------------

1000 item list --> {direct:52, |reference|:7, |script object|:1}
2000 item list --> {direct:204, |reference|:14, |script object|:2}
3000 item list --> {direct:462, |reference|:21, |script object|:3}
4000 item list --> {direct:811, |reference|:28, |script object|:4}

I'm sure you get the picture...

Note that, for shorter lists (containing, say, a dozen or so items), referencing techniques are really not worth considering. At this level, the times involved are relatively insignificant - and the time taken to set up a script object can offset any potential gain.

--------------SECOND PART OF YOUR SCRIPT------------
tell m1's text items to if (count) is 1 then set m2's end to i's contents else set AppleScript's text item delimiters to return set m1 to beginning & ({""} & rest) end if Here you began with a tell, but there’s no end tell and..., well I don’t understand it :(

The absence of an "end tell" may not look quite right at first glance. But let's take a look at a few simple examples that might help to demonstrate what's actually going on here.

As you may already know, most control statements are compound statements (blocks) that contain other statements - possibly including further, nested control statements. While the last line of a compound statement always starts with the word "end", some control statements (notably simple 'tell' and 'if' blocks) can optionally be written as a single statement - obviating the need for a balancing "end" line:

------------------

tell application "Finder"
	beep
end tell

-- or:

tell application "Finder" to beep
(* no 'end tell' *)

------------------

if true then
	beep
end if

-- or:

if true then beep
(* no 'end if' *)

------------------

This syntactic flexibility also allows the conjoining of opening lines from successive control statements. As statements are "combined", the number of balancing "end" lines is reduced accordingly:

------------------

tell application "Finder"
	if true then
		beep
	end if
end tell

-- or:

tell application "Finder"
	if true then beep
	(* no 'end if' *)
end tell

-- or:

tell application "Finder" to if true then
	beep
end if
(* no 'end tell' *)

-- or even:

tell application "Finder" to if true then beep
(* no 'end if' or 'end tell' *)

------------------

Other forms of compound statement may be "appended" in a similar way:

------------------

tell application "Finder" to with timeout of 2 seconds
end timeout
(* no 'end tell' *)

------------------

if true then considering case
end considering
(* no 'end if' *)

------------------

tell application "Finder" to if true then repeat 3 times
end repeat
(* no 'end tell' or 'end if' *)

------------------

tell application "System Events" to if true then tell application "Finder" to ignoring application responses end ignoring (* no 'end tell', 'end if' or secondary 'end tell' *)

------------------

This type of syntax (especially extreme forms of it - such as in the latter example) may begin to obscure the general sense of the code. If this is the case, then explicit nesting can be restored. That last snippet could therefore be rewritten as:

------------------

tell application "System Events"
	if true then
		tell application "Finder"
			ignoring application responses
				-- do something
			end ignoring
		end tell
	end if
end tell

------------------

Obviously, this is just an example. Perhaps I should add that nested application tell blocks are generally considered best avoided - so maybe that should be:

------------------

tell application "System Events"
	set some_condition to true
end tell

if some_condition then
	tell application "Finder"
		ignoring application responses
			-- do something
		end ignoring
	end tell
end if

------------------

I’m sure there’s a more verbose way to do the same and would be more clear to me, but I didn’t found it. Can you do it for me? and probably for other beginners that sniff around this list?

Sure thing. The following version of the script includes expanded variable labels, additional comments and other minor modifications to (hopefully) clarify the way in which it was intended to work:

------------------

on missing_strings from sourceList1 against sourceList2 (* carry out an initial check, in case one of the lists is empty *) if (count sourceList1) is 0 or (count sourceList2) is 0 then -- at least one of the source lists is empty return {sourceList2, sourceList1} -- if list A is empty, then every item in list B is missing from it -- this also means that no items will be missing from list B - so simply switch the lists around -- if both lists are empty then no items are missing from either, so {{}, {}} is returned anyway end if (* the following routine adds items to one list and subtracts them from the other *) set missingList2 to {} -- initialise missingList2 as an empty AppleScript list - to which items may be added set originalDelimiters to text item delimiters -- store the current value of AppleScript's text item delimiters set text item delimiters to return -- prepare for a list-to-string coercion (below), which will produce a return-delimited string (* the following statement uses AppleScript's string concatenation rules to invoke a list-to-string coercion for further information, see: http://tinyurl.com/bfu4y *) set missingList1 to return & sourceList2 & return -- assign sourceList2's items to missingList1 -- coerce the list to a string -- insert a return character at the beginning and end of the string -- missingList1 is now a return-encased, return-delimited text list - from which items may be subtracted (* in the following repeat loop, each search item will start and end with a return character wrapping the list in return characters ensures that its first and last items can also be matched correctly *) repeat with currentItem in sourceList1 set text item delimiters to return & currentItem & return -- encase the current item in return characters -- this ensures that only an exact match will be found in missingList1 -- (for example, "file 3" will not now return a match for part of "file 32") set currentTextItems to missingList1's text items -- store the value of missingList1's text items if (count currentTextItems) is 1 then -- no match for the current item has been found in missingList1 (formerly sourceList2) set missingList2's end to currentItem's contents -- since the current item is not in sourceList2, add it to missingList2 else -- a match for the current item has been found - so it is not missing from sourceList1 set text item delimiters to return set missingList1 to currentTextItems as string -- replace what was <return & currentItem & return> in missingList1 with return -- (this effectively deletes currentItem from missingList1) end if end repeat set text item delimiters to originalDelimiters -- restore AppleScript's text item delimiters to their initial value when the handler was entered (* at this point, missingList1 is still a return-encased, return- delimited text list missingList2 is still a regular AppleScript list *) if missingList1 is return then return {{}, missingList2} -- missingList1 contains no listed items - so it should correctly be returned as an empty list -- otherwise, the statement below would return it as a list of two empty strings: {"", ""} set missingList1 to missingList1's paragraphs 2 thru -2 -- convert missingList1 from a text list to an AppleScript list -- in addition, remove its empty outer strings return {missingList1, missingList2} end missing_strings

set Modulenames to {"name1", "name1name2", "name2", "name2name3", "name3", "name4", "name6"} set Foldernames to {"name1", "name1name2", "name3", "name4", "name4name5", "name5", "name6"}

missing_strings from Modulenames against Foldernames returning {missingModules, missingFolders}

--> {{"name4name5", "name5"}, {"name2", "name2name3"}}

------------------

I apologise for the length of this reply, which I hope helps to answer at least some of your questions.

:-)

---
kai


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Re: Best practices for creating and comparing lists of text? (From: CYB <email@hidden>)




Prev by Date:
Re: Finding lines containing foo in a file

Next by Date:
Re: Finding lines containing foo in a file

Previous by thread:
Re: Best practices for creating and comparing lists of text?

Next by thread:
Re: Best practices for creating and comparing lists of text?

Index(es):

Date
Thread