• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: [OT] URL terminators
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [OT] URL terminators


  • Subject: Re: [OT] URL terminators
  • From: Paul Berkowitz <email@hidden>
  • Date: Wed, 10 Jul 2002 07:47:19 -0700

On 7/9/02 8:12 AM, "Arthur J. Knapp" <email@hidden> wrote:

>> From: Paul Berkowitz <email@hidden>
>
>> Is there somewhere I can go to read about what characters are valid for http
>> URLs. and in particular which characters would indicate that an http URL has
>> terminated, in HTML source code? E.g., " ", "<BR>", "&gt;" are ones I know
>> about about, optionally preceded by (return). I'd guess that (tab) would be
>> another, but maybe not. I think there's another way of indicating (space),
>> when there at=re several spaces, but I forget what it is.
>
> You may be on the wrong track. When you say "in HTML source code", it
> is important to understand that a legitimate browser "link" is always
> going to be delimited by quotes, ie: href="http://...";, src="http://...";.
>
> It is far more dificult when you need to grab URLs that are not a part
> of HTML syntax, as when they occur in an unlinked form in the HTML's body
> text, or when they occur anywhere else, as in email, Word documents,
> Stickies files, etc.

Yes, that's it. I'm trying to grab URLs in plain email and turn them into
real hyperlinks in HTML. I found another way of solving this particular
problem by forcing longer URLs to be converted first,. then checking each
URL to see if it is contained in a previous one in the master list and if
so, checking if the continuing text begins of the remainder of the longer
URL. This is specific, and the "master-list_" will always be very small in
this context. So it works perfectly.

What would be nice is if I could encode stuff the user may enter as a
display name in case s/he enters accents and other non-basic characters and
numbers. If I were in OS 8/9, I know that both Tanaka's and Akua have HTML
encoders. But I'm in the present (aka OS X).

I _think_ someone here once mentioned an osax that converts all character
format, maybe HTML a well too, also in OS X? Can anyone remind me of it?
>
> Here are the two primary documents that defined the URL:
>
> <http://RFC.net/rfc1738.html>
> <http://RFC.net/rfc1808.html>

Thank you.
>
>
> If you are making use of regular expressions, there are lots of perl
> examples out there of how to extract a legitamte URL out of any kind
> of text, but I don't seem able to find any links at the moment.
>
>
>
> I'll try to post a URL extractor of my own sometime later today...

I'll look forward to that, however long "today" may take to arrive. Thanks,
Arthur.


--
Paul Berkowitz
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: [OT] URL terminators (From: "Arthur J. Knapp" <email@hidden>)

  • Prev by Date: Re: uhoh
  • Next by Date: Force quiting URL Access Scripting
  • Previous by thread: Re: [OT] URL terminators
  • Next by thread: Re: [OT] URL terminators
  • Index(es):
    • Date
    • Thread