Re: Regex pattern to find URLs
Re: Regex pattern to find URLs
- Subject: Re: Regex pattern to find URLs
- From: "b.bum" <email@hidden>
- Date: Sat, 6 Nov 2004 18:06:58 -0800
On Nov 6, 2004, at 5:42 PM, Kevin Ballard wrote:
I did exactly that, because I didn't think it would work, and it
didn't. I don't know how it worked for you when it didn't for me,
especially since that RegEx shouldn't have worked for you (i.e. it's
not that I just did a typo).
OK -- that is just bizarre. Send me a transcript. I'm using the stock
Python interpreter as installed on Panther.
That's not true. Regex's can't handle recursion. Sure, you could put a
case for (http://www.foo.com/bar(blah).html), but what about
(http://www.foo.com/bar(blah(foo)).html)? Or something weird like,
say, [http://www.foo.com/bar(]).html] ?
True, Regex's can't handle recursion. But that is more an academic
issue and not a practical one. You could easily compose or
dynamically generate regular expressions that allow for nesting of
parens or brackets to whatever depth you might find to be reasonable --
2 or 3 levels should be more than enough. (Frighteningly, Perl's
regular expression engine *can* do recursive expressions and there is a
proposal to do something similar in Python)
The weird case isn't that hard:
>>> import re
>>> x = '[http://www.foo.com/bar(]).html]'
>>> r = re.compile('^\[(?P<url>http://[^)]*\)[^]]*)')
>>> r.match(x).group('url')
'http://www.foo.com/bar(]).html'
But the weird case does illuminate a flaw with any kind of regular
expression based parsing of arbitrary input. You are going to spend
*a lot* of time dealing with special cases and evaluation ordering
issues.
With any regular expression based parsing of arbitrary input, there
will always be edge cases that your regular expressions cannot handle.
When parsing free form text, I have found it for more productive to
try and find as much sample input "in the wild" as I can and keep
beating on my regular expressions until it matches everything in the
sample input.
b.bum
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden