Re: Regex pattern to find URLs
Re: Regex pattern to find URLs
- Subject: Re: Regex pattern to find URLs
- From: John Siracusa <email@hidden>
- Date: Sun, 07 Nov 2004 08:47:54 -0500
On 11/6/04 2:59 AM, Kevin Ballard wrote:
> When I throw at it
>
> (http://www.foo.com/foo)test.
>
> it matches
>
> http://www.foo.com/foo)test.
>
> as the URL, which isn't correct.
Actually, I'm pretty sure that it is. From RFC 2396:
2.3. Unreserved Characters
Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include upper and lower case
letters, decimal digits, and a limited set of punctuation marks and
symbols.
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
Unreserved characters can be escaped without changing the semantics
of the URI, but this should not be done unless the URI is being used
in a context that does not allow the unescaped character to appear.
> Like I said, a single regex *cannot* deal with this situation.
There are no arbitrarily nested constructs in URIs, AFAIK, so a single regex
should be able to handle it.
-John
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden