Re: Suggestions for modifying read_parse
Re: Suggestions for modifying read_parse
- Subject: Re: Suggestions for modifying read_parse
- From: has <email@hidden>
- Date: Tue, 6 Jul 2004 12:20:02 +0100
Gregory J. Smith wrote:
Apple has an example applescript for parsing a HTML file.
<http://www.apple.com/applescript/guidebook/sbrt/pgs/sbrt.04.htm> (see
read_parse(this_file, opening_tag, closing_tag, contents_only)). I
would like to modify it to parse a string rather than a file but
otherwise do the same thing. Could someone on the list provide some
help or suggest an alternative?
Better HTML parser here:
http://applemods.sourceforge.net/mods/Internet/HTMLParser.php
It's modelled after Python's HTMLParser module, and provides a basic
SAX-style interface so is very flexible - just supply it with an HTML
string and an object to handle the parsing events you're interested
in. Here's a simple example using it to extract links from <a> tags:
------- BEGIN SCRIPT -------
property _Loader : run application "LoaderServer"
----------------------------------------------------------------------
-- DEPENDENCIES
property _HTMLParser : missing value
on __load__(loader)
set _HTMLParser to loader's loadLib("HTMLParser")
end __load__
----------------------------------------------------------------------
__load__(_Loader's makeLoader())
-------
on makeReceiver()
script
property parent : _HTMLParser's makeEventReceiver()
property _result : {}
on handleStartTag(tagName, attributesList)
if tagName = "a" then
repeat with attRef in attributesList
if attRef's item 1 = "href" then
set _result's end to attRef's item 2
end if
end repeat
end if
end handleStartTag
on getResult()
return _result
end getResult
end script
end makeReceiver
set html to "
<html>
<head><title>Hello World</title></head>
<body>
<ul id='navbar'>
<li><a href='/index.html'>Home</a></li>
<li><a href='/products.html'>Products</a></li>
<li><a href='/about.html'>About</a></li>
<li><a href='/contact.html'>Contact</a></li>
</ul>
</body>
</html>"
set rec to makeReceiver()
_HTMLParser's parseHTML(html, rec)
return rec's getResult()
--> {"/index.html", "/products.html", "/about.html", "/contact.html"}
------- END SCRIPT -------
You'll also need to install Loader
<
http://applemods.sourceforge.net/getstarted.html > and the EveryItem
library <
http://applemods.sourceforge.net/mods/Data/EveryItem.php> if
you don't already have them.
HTH
has
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.