Re: [OT] URL terminators
Re: [OT] URL terminators
- Subject: Re: [OT] URL terminators
- From: "Arthur J. Knapp" <email@hidden>
- Date: Tue, 09 Jul 2002 11:12:03 -0400
>
Date: Mon, 08 Jul 2002 12:14:36 -0700
>
Subject: [OT] URL terminators
>
From: Paul Berkowitz <email@hidden>
>
Is there somewhere I can go to read about what characters are valid for http
>
URLs. and in particular which characters would indicate that an http URL has
>
terminated, in HTML source code? E.g., " ", "<BR>", ">" are ones I know
>
about about, optionally preceded by (return). I'd guess that (tab) would be
>
another, but maybe not. I think there's another way of indicating (space),
>
when there at=re several spaces, but I forget what it is.
You may be on the wrong track. When you say "in HTML source code", it
is important to understand that a legitimate browser "link" is always
going to be delimited by quotes, ie: href="
http://...", src="
http://...".
It is far more dificult when you need to grab URLs that are not a part
of HTML syntax, as when they occur in an unlinked form in the HTML's body
text, or when they occur anywhere else, as in email, Word documents,
Stickies files, etc.
Here are the two primary documents that defined the URL:
<
http://RFC.net/rfc1738.html>
<
http://RFC.net/rfc1808.html>
If you are making use of regular expressions, there are lots of perl
examples out there of how to extract a legitamte URL out of any kind
of text, but I don't seem able to find any links at the moment.
>
I'm having a problem in a script, using a known URL as an AppleScript text
>
item delimiter. It gives a "false positive" when the URL in question is the
>
first part of a longer URL, so I want to be able to specify that the
>
character(s) following it must be one of the complete set of possibilities
>
indicating that the URL is not continuing but has terminated.
Which will work 99% of the time. Be aware that sometimes people will
type a URL in a "word-processy" sort of way that can trip you up, as when
normal sentence punctuation and whitespace are added.
I'll try to post a URL extractor of my own sometime later today...
{ Arthur J. Knapp, of <
http://www.STELLARViSIONs.com>
a r t h u r @ s t e l l a r v i s i o n s . c o m
}
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.