Re: [OT] URL terminators
Re: [OT] URL terminators
- Subject: Re: [OT] URL terminators
- From: Paul Berkowitz <email@hidden>
- Date: Mon, 08 Jul 2002 23:36:36 -0700
On 7/8/02 11:17 PM, "email@hidden" <email@hidden> wrote:
>
Paul,
>
>
What I'm about to pop out is not OT specific, but rather
>
http specific: aside from the domain name resolution to a
>
specific IP address, the rest of the URL is locally
>
derrived and specified. As a result, any character other
>
than enter/return/carriage return/line feed can be used
>
within an URL, and often is by various back-end databases.
>
Spaces in specific are often generated by such software
>
(and by well intentioned, but unknowledgable beginning html
>
programmers in saving their files); the standard conversion
>
for a space within an URL string would be ; tabs are .
>
>
E.G. - http://www.somewhere.net/this and that/whatever.html
>
would be fed back into the browser [as string] as:
>
http://www.somewhere.net/this and that/whatever.html
>
>
For more information, see:
>
http://www.w3.org/Addressing/URL/url-spec.txt
>
Thanks, Marc. What I was really after (and thus more on-topic) was not if
such characters _could_ be encoded for URLs, but which _have to be_, so that
if thy are _not so encoded, they must be text following the URL rather than
part of it. I'm trying to find all the different ways that a URL can be
recognized as having terminated - any character that will "break" a URL
would be one of those. Unfortunately, it appears as if sometimes pairs or
combinations of characters will be part of the URL whereas the same (first)
character of the pair on its own wouldn't be. Also, I would have an
extremely large set of combinations to have to look for and exclude.
In any case, I figured out a less contorted and exhaustive way of solving my
particular problem - URLs which are subsets of longer URLs I'm also looking
for. I can force the longer one to be searched for first, and then exclude
from consideration false "finds" for the shorter one when they are preceded
by the new text ("<A HREF=\"") that the script has added when it found the
longer URL. It seems to work well.
I'll go read up on it in any case. I should know this stuff. Many thanks
for the link.
Best,
--
Paul Berkowitz
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.