Re: Regex pattern to find URLs
Re: Regex pattern to find URLs
- Subject: Re: Regex pattern to find URLs
- From: "b.bum" <email@hidden>
- Date: Sat, 6 Nov 2004 09:22:36 -0800
On Nov 5, 2004, at 11:59 PM, Kevin Ballard wrote:
When I throw at it
(http://www.foo.com/foo)test.
it matches
http://www.foo.com/foo)test.
as the URL, which isn't correct. Like I said, a single regex *cannot*
deal with this situation.
A single regex can handle that situation, it is just a pain to write.
Using Python's regular expressions as an example (because named
subexpressions are a lot nicer than indice based subexpressions):
>>> import re
>>> r = re.compile('^\((?P<u1>http://[^)]*)|(?P<u2>http://.*)')
>>> r.match('(http://foo.com/baz)bar').group('u1')
'http://foo.com/baz'
>>> r.match('http://foo.com/baz/bar').group('u2')
'http://foo.com/baz/bar'
The '|' -- or operator -- is the key. Ordering the expressions is
equally as important as you must have the most specific matching
expression first.
Frankly, I would lean to writing multiple expressions and evaluating
them one after the other. Unless performance is critical at which
point a single expression will be more efficient CPU wise (but may have
significant memory impact as the state map used internally can become
quite large, especially if you are using a unicode capable regex
engine).
b.bum
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden