• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Regex pattern to find URLs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regex pattern to find URLs


  • Subject: Re: Regex pattern to find URLs
  • From: "b.bum" <email@hidden>
  • Date: Sat, 6 Nov 2004 18:06:58 -0800

On Nov 6, 2004, at 5:42 PM, Kevin Ballard wrote:
I did exactly that, because I didn't think it would work, and it didn't. I don't know how it worked for you when it didn't for me, especially since that RegEx shouldn't have worked for you (i.e. it's not that I just did a typo).

OK -- that is just bizarre. Send me a transcript. I'm using the stock Python interpreter as installed on Panther.


That's not true. Regex's can't handle recursion. Sure, you could put a case for (http://www.foo.com/bar(blah).html), but what about (http://www.foo.com/bar(blah(foo)).html)? Or something weird like, say, [http://www.foo.com/bar(]).html] ?

True, Regex's can't handle recursion. But that is more an academic issue and not a practical one. You could easily compose or dynamically generate regular expressions that allow for nesting of parens or brackets to whatever depth you might find to be reasonable -- 2 or 3 levels should be more than enough. (Frighteningly, Perl's regular expression engine *can* do recursive expressions and there is a proposal to do something similar in Python)


The weird case isn't that hard:

>>> import re
>>> x = '[http://www.foo.com/bar(]).html]'
>>> r = re.compile('^\[(?P<url>http://[^)]*\)[^]]*)')
>>> r.match(x).group('url')
'http://www.foo.com/bar(]).html'

But the weird case does illuminate a flaw with any kind of regular expression based parsing of arbitrary input. You are going to spend *a lot* of time dealing with special cases and evaluation ordering issues.

With any regular expression based parsing of arbitrary input, there will always be edge cases that your regular expressions cannot handle. When parsing free form text, I have found it for more productive to try and find as much sample input "in the wild" as I can and keep beating on my regular expressions until it matches everything in the sample input.

b.bum


_______________________________________________ Do not post admin requests to the list. They will be ignored. Cocoa-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
  • Follow-Ups:
    • Re: Regex pattern to find URLs
      • From: Kevin Ballard <email@hidden>
References: 
 >Re: Regex pattern to find URLs (From: John Siracusa <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)
 >Re: Regex pattern to find URLs (From: John Stiles <email@hidden>)
 >Re: Regex pattern to find URLs (From: "b.bum" <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)
 >Re: Regex pattern to find URLs (From: "b.bum" <email@hidden>)
 >Re: Regex pattern to find URLs (From: Kevin Ballard <email@hidden>)

  • Prev by Date: Re: Hiring a coder to squash a bug?
  • Next by Date: Re: Book about creating Cocoa widgets
  • Previous by thread: Re: Regex pattern to find URLs
  • Next by thread: Re: Regex pattern to find URLs
  • Index(es):
    • Date
    • Thread