Re: Regex pattern to find URLs
Re: Regex pattern to find URLs
- Subject: Re: Regex pattern to find URLs
- From: Kevin Ballard <email@hidden>
- Date: Sat, 6 Nov 2004 20:42:31 -0500
On Nov 6, 2004, at 8:37 PM, b.bum wrote:
Uhh, that first match won't even work - your regex requires a ( at
the beginning of the string.
It works fine. That was copy/pasted directly from a Terminal session
that demonstrated it working. You could fire up the Python
interpreter and do the same.
I did exactly that, because I didn't think it would work, and it
didn't. I don't know how it worked for you when it didn't for me,
especially since that RegEx shouldn't have worked for you (i.e. it's
not that I just did a typo).
I do see what you mean how if you use an alternation you could
possibly get a URL surrounded by ()'s, but then what about <> and []?
And then what about (http://www.foo.com/bar(blah).html)? Humans can
tell the inner parens are for the URL but I can't imagine how a regex
can.
It is just a matter of stringing together enough regex goop to make it
work. It only breaks down when there are true ambiguities.
(http://www.foo.com/bar(blah).html) can be matched. You would need to
use a subexpression something like this:
^\(http:[^(]*\([^)]*\)[^)]*
That is, skip the paren before the http:, consume all characters up to
an inner (, consume the inner (, consume all characters up to inner ),
consume inner ), consume all characters up to last ), done.
Regular expressions are just big state maps. If a human can read a
string and tell, without ambiguity, what is and is not a part of the
URL, then a human can write a regular expression to do the same.
That's not true. Regex's can't handle recursion. Sure, you could put a
case for (http://www.foo.com/bar(blah).html), but what about
(http://www.foo.com/bar(blah(foo)).html)? Or something weird like, say,
[http://www.foo.com/bar(]).html] ?
--
Kevin Ballard
email@hidden
http://www.tildesoft.com
http://kevin.sb.org
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden