Re: Regex pattern to find URLs
Re: Regex pattern to find URLs
- Subject: Re: Regex pattern to find URLs
- From: "b.bum" <email@hidden>
- Date: Sat, 6 Nov 2004 19:36:42 -0800
On Nov 6, 2004, at 7:01 PM, Kevin Ballard wrote:
The problem with your weird cases is that they're always individual
regex's and not combined into the über-regex. I'm just wondering if
it's possible to combine all the cases such that it works properly for
everything - you may run into cases that, depending on which way you
order, will cause different URLs to parse incorrectly. But that's just
because there's so many weird ways to do URLs.
Well, Yes... but that was mostly because I didn't want to take the time
to write out the full expression and, as indicated, the challenge will
all of this is that ordering is highly critical and a minor change to a
regex can cause vastly different results.
Given that regular expressions are just big state maps, I have always
wanted a tool that visualizes a regex and allows one to see exactly how
a string transitions through the map to eventual acceptance or
rejection.  That'd be cool.
I'm personally a fan of just finding the *beginning* of the URL (like,
say, the http://user:pass@domain bit) and then figuring out the path
based on non-regex code (because you can encode all the logic you want
much easier that way).
And to those ends, the current working draft of URI syntax is here:
	http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html
Not that it will entirely serve the purposes.  The real URI standard is
something like "generate whatever random garbage for a URI that
targeted browsers accept".   Much like the rest of the web, the
"standard" is more a casual guideline that few follow completely.
Of course, finding the beginning of the URL without taking into account
the characters immediately proceeding the URL makes finding the end of
the URL difficult -- as has been demonstrated numerous times in this
thread.
Funny -- if you look at all the apps on Mac OS X, you can definitely
see subtle differences in the URL parsing.
b.bum
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden