• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Regex pattern to find URLs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regex pattern to find URLs


  • Subject: Re: Regex pattern to find URLs
  • From: Kevin Ballard <email@hidden>
  • Date: Sat, 6 Nov 2004 20:42:31 -0500


On Nov 6, 2004, at 8:37 PM, b.bum wrote:

Uhh, that first match won't even work - your regex requires a ( at the beginning of the string.

It works fine. That was copy/pasted directly from a Terminal session that demonstrated it working. You could fire up the Python interpreter and do the same.

I did exactly that, because I didn't think it would work, and it didn't. I don't know how it worked for you when it didn't for me, especially since that RegEx shouldn't have worked for you (i.e. it's not that I just did a typo).


I do see what you mean how if you use an alternation you could possibly get a URL surrounded by ()'s, but then what about <> and []? And then what about (http://www.foo.com/bar(blah).html)? Humans can tell the inner parens are for the URL but I can't imagine how a regex can.

It is just a matter of stringing together enough regex goop to make it work. It only breaks down when there are true ambiguities. (http://www.foo.com/bar(blah).html) can be matched. You would need to use a subexpression something like this:


	^\(http:[^(]*\([^)]*\)[^)]*

That is, skip the paren before the http:, consume all characters up to an inner (, consume the inner (, consume all characters up to inner ), consume inner ), consume all characters up to last ), done.

Regular expressions are just big state maps. If a human can read a string and tell, without ambiguity, what is and is not a part of the URL, then a human can write a regular expression to do the same.

That's not true. Regex's can't handle recursion. Sure, you could put a case for (http://www.foo.com/bar(blah).html), but what about (http://www.foo.com/bar(blah(foo)).html)? Or something weird like, say, [http://www.foo.com/bar(]).html] ?


--
Kevin Ballard
email@hidden
http://www.tildesoft.com
http://kevin.sb.org

Attachment: smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Regex pattern to find URLs
      • From: "b.bum" <email@hidden>
References: 
 >Re: Regex pattern to find URLs (From: John Siracusa <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)
 >Re: Regex pattern to find URLs (From: John Stiles <email@hidden>)
 >Re: Regex pattern to find URLs (From: "b.bum" <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)
 >Re: Regex pattern to find URLs (From: "b.bum" <email@hidden>)

  • Prev by Date: Re: Regex pattern to find URLs
  • Next by Date: Re: Hiring a coder to squash a bug?
  • Previous by thread: Re: Regex pattern to find URLs
  • Next by thread: Re: Regex pattern to find URLs
  • Index(es):
    • Date
    • Thread