Re: Capturing parts of text string [Re: Applescript-users Digest, Vol 2, Issue 443]
Re: Capturing parts of text string [Re: Applescript-users Digest, Vol 2, Issue 443]
- Subject: Re: Capturing parts of text string [Re: Applescript-users Digest, Vol 2, Issue 443]
- From: has <email@hidden>
- Date: Wed, 6 Jul 2005 12:25:55 +0100
kai wrote:
>I share your enthusiasm for regular expressions, has - especially when there's some heavy lifting to be done. [...] That said, I can't help thinking it a pity if a vanilla approach couldn't cope with a relatively simple text cleaning exercise.
In most languages "vanilla" does encompass regular expressions, since most languages include regex support in either their standard library or the core language itself. Sure you can do the same work with loops and conditionals, but in most cases it'd just be a make-work exercise. i.e. It's not that it can't be done, but that it simply _isn't worth_ the effort. e.g. Here's that regex again:
search txt for "^\\s*(.*?)\\s*,\\s*(.*?)\\s+([-0-9]+)\\s*$" with regex and individual line matching
and here's its vanilla equivalent:
-------
property _whitespace : space & tab & (ASCII character 11) -- & return & ASCII character 10
property _zipcode : "0123456789-"
set txt to " Las Vegas , Nevada 89102
Northbrook,IL 60062
Las Cruces ,New Mexico 85123-1234 "
set res to {}
repeat with pararef in txt's paragraphs
-- get city
set citystart to 1
repeat while pararef's item citystart is in _whitespace
set citystart to citystart + 1
end repeat
set delim1 to citystart
repeat while pararef's item delim1 is not ","
set delim1 to delim1 + 1
end repeat
set cityend to delim1 - 1
repeat while pararef's item cityend is in _whitespace
set cityend to cityend - 1
end repeat
set cityname to pararef's text citystart thru cityend
-- get zip
set zipend to pararef's length
repeat while pararef's item zipend is in _whitespace
set zipend to zipend - 1
end repeat
set delim2 to zipend
repeat while pararef's item delim2 is in _zipcode
set delim2 to delim2 - 1
end repeat
set zipstart to delim2 + 1
set zipcode to pararef's text zipstart thru zipend
-- get state
set statestart to delim1 + 1
repeat while pararef's item statestart is in _whitespace
set statestart to statestart + 1
end repeat
set stateend to delim2
repeat while pararef's item stateend is in _whitespace
set stateend to stateend - 1
end repeat
set statename to pararef's text statestart thru stateend
set end of res to {cityname, statename, zipcode}
end repeat
return res
-------
Even when I've already figured out the desired algorithm, this version still took me ten times as long to plan, write and debug than the regex-based solution. And if I want to adjust its behaviour or grok it in six-months' time then that'll take much longer too. If it needs to crunch high volumes of data in limited time then execution speed will be an issue too. And the more complex the code, the more there is that can potentially go wrong; e.g. even though the above code is straightforward and unsophisticated (it's just simple repetition, as you can see), there were still a couple of sly off-by-one bugs that initially managed to sneak in. So vanilla is not an approach I'd use unless circumstances demanded or forced it (e.g. having to avoid outside osax/application dependencies due to strict portability requirements).
>Perhaps something like:
>
>on locationAsCleanList(l)
>[...]
>end locationAsCleanList
That code's somewhat obfuscated, and I couldn't determine exactly how it operates without sitting down and pulling it apart. It's hard to be confident that code is correct when you can't easily understand it. And even harder to debug or change it. I'd also be leery of using stuff like 'text (word i) thru (word j)' as a poor man's trim() since it'll also knock off stuff like leading and trailing punctuation. Might get away with that in some cases, but in others it'll bite you. Factors like these also have to be considered; there's more to measuring codes' worth than how many lines long it is.
Good programming is all about choosing your battles wisely - i.e. don't start one unless you have to. Remember, as the great Larry Wall says, Laziness is a Virtue - and you don't get much lazier than a good regex. ;)
HTH
has
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden