• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Regex pattern to find URLs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regex pattern to find URLs


  • Subject: Re: Regex pattern to find URLs
  • From: "b.bum" <email@hidden>
  • Date: Sat, 6 Nov 2004 19:36:42 -0800

On Nov 6, 2004, at 7:01 PM, Kevin Ballard wrote:
The problem with your weird cases is that they're always individual regex's and not combined into the über-regex. I'm just wondering if it's possible to combine all the cases such that it works properly for everything - you may run into cases that, depending on which way you order, will cause different URLs to parse incorrectly. But that's just because there's so many weird ways to do URLs.

Well, Yes... but that was mostly because I didn't want to take the time to write out the full expression and, as indicated, the challenge will all of this is that ordering is highly critical and a minor change to a regex can cause vastly different results.


Given that regular expressions are just big state maps, I have always wanted a tool that visualizes a regex and allows one to see exactly how a string transitions through the map to eventual acceptance or rejection. That'd be cool.

I'm personally a fan of just finding the *beginning* of the URL (like, say, the http://user:pass@domain bit) and then figuring out the path based on non-regex code (because you can encode all the logic you want much easier that way).

And to those ends, the current working draft of URI syntax is here:

	http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html

Not that it will entirely serve the purposes. The real URI standard is something like "generate whatever random garbage for a URI that targeted browsers accept". Much like the rest of the web, the "standard" is more a casual guideline that few follow completely.

Of course, finding the beginning of the URL without taking into account the characters immediately proceeding the URL makes finding the end of the URL difficult -- as has been demonstrated numerous times in this thread.

Funny -- if you look at all the apps on Mac OS X, you can definitely see subtle differences in the URL parsing.

b.bum

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Re: Regex pattern to find URLs (From: John Siracusa <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)
 >Re: Regex pattern to find URLs (From: John Stiles <email@hidden>)
 >Re: Regex pattern to find URLs (From: "b.bum" <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)
 >Re: Regex pattern to find URLs (From: "b.bum" <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)
 >Re: Regex pattern to find URLs (From: "b.bum" <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)

  • Prev by Date: Re: Regex pattern to find URLs
  • Next by Date: Re: Regex pattern to find URLs
  • Previous by thread: Re: Regex pattern to find URLs
  • Next by thread: Re: Regex pattern to find URLs
  • Index(es):
    • Date
    • Thread