Re: extract URL from general text
Re: extract URL from general text
- Subject: Re: extract URL from general text
- From: Rainer Standke <email@hidden>
- Date: Wed, 19 Mar 2008 11:15:28 -0700
Gary,
could you elaborate on the concept of 'curling it', please?
Rainer
On Mar 18, 2008, at 15:37 , Gary (Lists) wrote:
"Hudson Barton" wrote:
I need to extract valid "http" URL's from general (non-html) text. I
define what is valid as follows:
1. begins with "http:"
2. preceded by " "
3. followed by " "
4. containing only valid characters (or validly encoded characters)
as per RFC1738
Best you don't go re-defining things like this.
Your 'definition' wouldn't collect the url <http://google.com> in this
message, for instance. [It would fail your (1.) and (2.) and (3.)]
If you want URLs from text, don't worry about your 'definition' at
first.
Just get things that are written to LOOK like urls.
Then, if you want, you can verify/validate each one separately,
weeding out
the problems.
For instance, if I were to refer to http://blahblahblah.co.uk, then
I told
the reader to "put your own domain in there", you're 'definition'
would find
that domain, by your rules, but would ignore my previous google
example.
To me, a valid url is a url that actually points somewhere. Curl it.
http://thisIsAValidDomainStringButNotAValidDomain.com
--
Gary
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (applescript-
email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden