Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: extract URL from general text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: extract URL from general text

Subject: Re: extract URL from general text
From: "Gary (Lists)" <email@hidden>
Date: Tue, 18 Mar 2008 18:37:24 -0400
Thread-topic: extract URL from general text

"Hudson Barton" wrote:

> I need to extract valid "http" URL's from general (non-html) text.  I
> define what is valid as follows:
>
> 1.  begins with "http:"
> 2.  preceded by " "
> 3.  followed by " "
> 4.  containing only valid characters (or validly encoded characters)
> as per RFC1738

Best you don't go re-defining things like this.

Your 'definition' wouldn't collect the url <http://google.com> in this
message, for instance. [It would fail your (1.) and (2.) and (3.)]

If you want URLs from text, don't worry about your 'definition' at first.
Just get things that are written to LOOK like urls.

Then, if you want, you can verify/validate each one separately, weeding out
the problems.

For instance, if I were to refer to http://blahblahblah.co.uk, then I told
the reader to "put your own domain in there", you're 'definition' would find
that domain, by your rules, but would ignore my previous google example.

To me, a valid url is a url that actually points somewhere. Curl it.

http://thisIsAValidDomainStringButNotAValidDomain.com

--
Gary

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

Follow-Ups:
- Re: extract URL from general text
  - From: Rainer Standke <email@hidden>
- Re: extract URL from general text
  - From: Hudson Barton <email@hidden>

References:
	>extract URL from general text (From: Hudson Barton <email@hidden>)

Prev by Date: Re: Toxic Soup and Enough for all
Next by Date: Re: Which version of InDesign created a document?
Previous by thread: Re: extract URL from general text
Next by thread: Re: extract URL from general text
Index(es):
- Date
- Thread