Extract URLs using TIDs (how to?) - vanilla AS answer
Extract URLs using TIDs (how to?) - vanilla AS answer
- Subject: Extract URLs using TIDs (how to?) - vanilla AS answer
- From: Charles Arthur <email@hidden>
- Date: Sun, 27 May 2001 11:04:35 +0100
Hi all..
Not long after I posted to the list asking this question I realised the
answer. Perhaps it was something to do with morphic resonance of showing
the problem to many minds. (In the hope that it works more generally, has
anyone seen my Handspring?)
My question was about extracting the /news/story URLs from strings like
thelist below..
set thelist to "<a href=\"/news/story/000000.html>Climber survives not
climbing for long period</a><font>all sort of other things here and lots
more HTML with some <tr><td>sorts of things thrown in and then another
search result which pops up as <a href=\"/news/story/000001.html>Somebody
climbs something, according to report in <a
href=\"http:www.anothersite.com\">Another site</a>."
This didn't quite work ...
>
set astid to AppleScript's text item delimiters
>
set AppleScript's text item delimiters to "/news/story/"
>
--which is a unique delimiter for the stories I want
>
>
set thelist to text items 2 thru -1 of thelist
>
-- 2 thru -1 because the first item will obviously not include the delimiter
>
(* which may, or may not, be a unique delimiter;
>
I haven't tested in depth but the HTML results sometimes have
>
embedded URLs
>
*)
>
set AppleScript's text item delimiters to astid
>
set thelist to thelist as string
>
set AppleScript's text item delimiters to ".html>"
>
set thelist to text items of thelist
>
repeat with anitem in thelist
>
display dialog anitem as string
>
end repeat
>
set AppleScript's text item delimiters to astid
>
--
but I had said that
>
>
If I then cycle through the text items of thelist, the first item is the
>
digits pointing to the story (which I can then pass to URL Access
>
Scripting).
That makes it obvious - the answer is to have a temporary variable which is
set to the first item of thelist, then search in that for the ending term.
Then cycle through thelist.
So...
set thelist to "<a href=\"/news/story/000000.html>Climber survives not
climbing for long period</a><font>all sort of other things here and lots
more HTML with some <tr><td>sorts of things thrown in and then another
search result which pops up as <a href=\"/news/story/000001.html>Somebody
climbs something, according to report in <a
href=\"http:www.anothersite.com\">Another site</a>."
set astid to AppleScript's text item delimiters
set collectedURLlist to {}
set AppleScript's text item delimiters to "/news/story/"
--which is a unique delimiter for the stories I want
set thelist to text items 2 thru -1 of thelist
repeat with anitem from 1 to count items in thelist
set thetempvar to item anitem of thelist
set AppleScript's text item delimiters to astid
set thetempvar to thetempvar as string
set AppleScript's text item delimiters to ".html>"
set thetempvar to text items of thetempvar
set collectedURLlist to collectedURLlist & item 1 of thetempvar -- ta daa!
end repeat
(* following just to show it works *)
repeat with anitem in collectedURLlist
display dialog anitem as string
end repeat
set AppleScript's text item delimiters to astid
--next line if in a separate handler
return collected URLlist
--end of sub.
This can be generalised to extract any sort of text with a start and end
string from any general string. Very neat: basically, it's that the string
you want is always (item 1 of (item 2 of thestring with startstring as TID)
with endstring as TID).
Thanks to Ricardo Montiel who suggested using RegEx (which I might
investigate too). I didn't do that because I wanted to investigate using
TIDs since they're vanilla AS, and so portable (surely...) to OS X in the
fullness of time. Though I suspect that regex stuff may be in OSX.
Thanks also to Bill Briggs, whose article on Maccentral about using TIDs to
search and replace ages ago gave me all the clues.
Charles
http://www.ukclimbing.com : 1,000+ British crags, 350+ British climbing walls
- searchable by distance rock type, etc, with 5-day weather forecasts for
every one - plus maps, articles, news, and the New Routes database. There's
even a cool shop attached...