• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Capturing parts of text string [Re: Applescript-users Digest, Vol 2, Issue 443]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Capturing parts of text string [Re: Applescript-users Digest, Vol 2, Issue 443]


  • Subject: Re: Capturing parts of text string [Re: Applescript-users Digest, Vol 2, Issue 443]
  • From: has <email@hidden>
  • Date: Wed, 6 Jul 2005 12:25:55 +0100

kai wrote:

>I share your enthusiasm for regular expressions, has - especially when there's some heavy lifting to be done. [...] That said, I can't help thinking it a pity if a vanilla approach couldn't cope with a relatively simple text cleaning exercise.

In most languages "vanilla" does encompass regular expressions, since most languages include regex support in either their standard library or the core language itself. Sure you can do the same work with loops and conditionals, but in most cases it'd just be a make-work exercise. i.e. It's not that it can't be done, but that it simply _isn't worth_ the effort. e.g. Here's that regex again:

search txt for "^\\s*(.*?)\\s*,\\s*(.*?)\\s+([-0-9]+)\\s*$" with regex and individual line matching

and here's its vanilla equivalent:

-------

property _whitespace : space & tab & (ASCII character 11) -- & return & ASCII character 10
property _zipcode : "0123456789-"

set txt to "   Las Vegas   ,   Nevada 89102
Northbrook,IL    60062
	Las Cruces	,New Mexico		85123-1234   "

set res to {}
repeat with pararef in txt's paragraphs
	-- get city
	set citystart to 1
	repeat while pararef's item citystart is in _whitespace
		set citystart to citystart + 1
	end repeat
	set delim1 to citystart
	repeat while pararef's item delim1 is not ","
		set delim1 to delim1 + 1
	end repeat
	set cityend to delim1 - 1
	repeat while pararef's item cityend is in _whitespace
		set cityend to cityend - 1
	end repeat
	set cityname to pararef's text citystart thru cityend
	-- get zip
	set zipend to pararef's length
	repeat while pararef's item zipend is in _whitespace
		set zipend to zipend - 1
	end repeat
	set delim2 to zipend
	repeat while pararef's item delim2 is in _zipcode
		set delim2 to delim2 - 1
	end repeat
	set zipstart to delim2 + 1
	set zipcode to pararef's text zipstart thru zipend
	-- get state
	set statestart to delim1 + 1
	repeat while pararef's item statestart is in _whitespace
		set statestart to statestart + 1
	end repeat
	set stateend to delim2
	repeat while pararef's item stateend is in _whitespace
		set stateend to stateend - 1
	end repeat
	set statename to pararef's text statestart thru stateend
	set end of res to {cityname, statename, zipcode}
end repeat
return res

-------


Even when I've already figured out the desired algorithm, this version still took me ten times as long to plan, write and debug than the regex-based solution. And if I want to adjust its behaviour or grok it in six-months' time then that'll take much longer too. If it needs to crunch high volumes of data in limited time then execution speed will be an issue too. And the more complex the code, the more there is that can potentially go wrong; e.g. even though the above code is straightforward and unsophisticated (it's just simple repetition, as you can see), there were still a couple of sly off-by-one bugs that initially managed to sneak in. So vanilla is not an approach I'd use unless circumstances demanded or forced it (e.g. having to avoid outside osax/application dependencies due to strict portability requirements).


>Perhaps something like:
>
>on locationAsCleanList(l)
>[...]
>end locationAsCleanList

That code's somewhat obfuscated, and I couldn't determine exactly how it operates without sitting down and pulling it apart. It's hard to be confident that code is correct when you can't easily understand it. And even harder to debug or change it. I'd also be leery of using stuff like 'text (word i) thru (word j)' as a poor man's trim() since it'll also knock off stuff like leading and trailing punctuation. Might get away with that in some cases, but in others it'll bite you. Factors like these also have to be considered; there's more to measuring codes' worth than how many lines long it is.

Good programming is all about choosing your battles wisely - i.e. don't start one unless you have to. Remember, as the great Larry Wall says, Laziness is a Virtue - and you don't get much lazier than a good regex. ;)

HTH

has
--
http://freespace.virgin.net/hamish.sanderson/
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Prev by Date: Mount iDisk
  • Next by Date: RE: Mount iDisk
  • Previous by thread: Re: Capturing parts of text string [Re: Applescript-users Digest, Vol 2, Issue 443]
  • Next by thread: What am I missing
  • Index(es):
    • Date
    • Thread