Re: [OT] URL terminators
Re: [OT] URL terminators
- Subject: Re: [OT] URL terminators
- From: Paul Berkowitz <email@hidden>
- Date: Wed, 10 Jul 2002 07:47:19 -0700
On 7/9/02 8:12 AM, "Arthur J. Knapp" <email@hidden> wrote:
>
> From: Paul Berkowitz <email@hidden>
>
>
> Is there somewhere I can go to read about what characters are valid for http
>
> URLs. and in particular which characters would indicate that an http URL has
>
> terminated, in HTML source code? E.g., " ", "<BR>", ">" are ones I know
>
> about about, optionally preceded by (return). I'd guess that (tab) would be
>
> another, but maybe not. I think there's another way of indicating (space),
>
> when there at=re several spaces, but I forget what it is.
>
>
You may be on the wrong track. When you say "in HTML source code", it
>
is important to understand that a legitimate browser "link" is always
>
going to be delimited by quotes, ie: href="http://...", src="http://...".
>
>
It is far more dificult when you need to grab URLs that are not a part
>
of HTML syntax, as when they occur in an unlinked form in the HTML's body
>
text, or when they occur anywhere else, as in email, Word documents,
>
Stickies files, etc.
Yes, that's it. I'm trying to grab URLs in plain email and turn them into
real hyperlinks in HTML. I found another way of solving this particular
problem by forcing longer URLs to be converted first,. then checking each
URL to see if it is contained in a previous one in the master list and if
so, checking if the continuing text begins of the remainder of the longer
URL. This is specific, and the "master-list_" will always be very small in
this context. So it works perfectly.
What would be nice is if I could encode stuff the user may enter as a
display name in case s/he enters accents and other non-basic characters and
numbers. If I were in OS 8/9, I know that both Tanaka's and Akua have HTML
encoders. But I'm in the present (aka OS X).
I _think_ someone here once mentioned an osax that converts all character
format, maybe HTML a well too, also in OS X? Can anyone remind me of it?
>
>
Here are the two primary documents that defined the URL:
>
>
<http://RFC.net/rfc1738.html>
>
<http://RFC.net/rfc1808.html>
Thank you.
>
>
>
If you are making use of regular expressions, there are lots of perl
>
examples out there of how to extract a legitamte URL out of any kind
>
of text, but I don't seem able to find any links at the moment.
>
>
>
>
I'll try to post a URL extractor of my own sometime later today...
I'll look forward to that, however long "today" may take to arrive. Thanks,
Arthur.
--
Paul Berkowitz
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.