Re: Best practices for creating and comparing lists of text?
Re: Best practices for creating and comparing lists of text?
- Subject: Re: Best practices for creating and comparing lists of text?
- From: has <email@hidden>
- Date: Sat, 17 Dec 2005 13:22:54 +0000
CYB wrote:
>tell t starts with "#" -- Which is the objective to do this. This is the first time I see something like this.
>I understand "starts" as a containment operator (boolean) , as part of some question, but here forming a tell block, (?) I didn't catch it
I think Kai is practising for the Obfuscated AppleScript Awards. I had to refactor it into the standard AS idiom before I could read it well enough to figure out how it worked. Apart from the syntactic flummery it also uses a clever algorithm that's very fast when the source data contains mostly 'modulename' style entries (but much slower when it contains mostly 'modulename=libname' style entries), though the OP didn't specify how much data they need to crunch, how quickly it needs to be crunched, or the relative frequencies of each type of entry, so this optimisation is somewhat premature.
The ugly script object kludge is a standard workaround for AS's abysmally inefficient list item lookups (the extra referencing tricks the AS interpreter into routing around one dodgy bit of internal code by using another), and unless a list is very small then this is worth using as AS will crawl otherwise.
Here's a simpler algorithm that should be much easier to understand, modify and troubleshoot. I've used the script object kludge for obvious reasons, and it's performance is quite respectable (not much different to the average performance of Kai's more convoluted algorithm, in fact):
to parse_lists(txt)
script k -- list access speed kludge
property linesList : paragraphs of txt
property moduleNames : {}
property libNames : {}
end script
set tid to text item delimiters
set text item delimiters to "="
repeat with lineRef in k's linesList
-- ignore blank lines and comments
if lineRef's contents is not "" and lineRef does not start with "#" then
set end of k's moduleNames to text item 1 of lineRef
-- is library name the same as module name?
if (count lineRef each text item) = 1 then
set end of k's libNames to text item 1 of lineRef
else
set end of k's libNames to text item 2 of lineRef
end if
end if
end repeat
set text item delimiters to tid
return {k's moduleNames, k's libNames}
end parse_lists
And frankly, if the performance of this code isn't good enough then it'd make far more sense to switch to a better language. AS is just plain SLOW, and resorting to hero programming to squeeze marginally better performance out of a slow language instead of just using a faster one is a fool's game. For example, here's a simple solution in Python - no speed demon itself - that's 50x faster than the AS solution above:
#!/usr/bin/python
import re
txt = """# List of module names (and library names if they differ)
Modulename1
Modulename2=Libname2
Modulename3
Modulename4=Libname4
"""
# this patter matches identifiers, but can be changed if needed
patt = re.compile('^([a-zA-Z0-9_]+)(?:=([a-zA-Z0-9_]*))?$', re.M)
lst = patt.findall(txt)
moduleNames, libNames = zip(*[(s1, s2 or s1) for s1, s2 in lst])
print moduleNames
print libNames
HTH
has
--
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden