• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Suggestions for modifying read_parse
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggestions for modifying read_parse


  • Subject: Re: Suggestions for modifying read_parse
  • From: has <email@hidden>
  • Date: Tue, 6 Jul 2004 12:20:02 +0100

Gregory J. Smith wrote:

Apple has an example applescript for parsing a HTML file.
<http://www.apple.com/applescript/guidebook/sbrt/pgs/sbrt.04.htm> (see
read_parse(this_file, opening_tag, closing_tag, contents_only)). I
would like to modify it to parse a string rather than a file but
otherwise do the same thing. Could someone on the list provide some
help or suggest an alternative?


Better HTML parser here:

http://applemods.sourceforge.net/mods/Internet/HTMLParser.php


It's modelled after Python's HTMLParser module, and provides a basic SAX-style interface so is very flexible - just supply it with an HTML string and an object to handle the parsing events you're interested in. Here's a simple example using it to extract links from <a> tags:


------- BEGIN SCRIPT -------

property _Loader : run application "LoaderServer"

----------------------------------------------------------------------
-- DEPENDENCIES

property _HTMLParser : missing value

on __load__(loader)
set _HTMLParser to loader's loadLib("HTMLParser")
end __load__

----------------------------------------------------------------------

__load__(_Loader's makeLoader())

-------

on makeReceiver()
script
property parent : _HTMLParser's makeEventReceiver()
property _result : {}
on handleStartTag(tagName, attributesList)
if tagName = "a" then
repeat with attRef in attributesList
if attRef's item 1 = "href" then
set _result's end to attRef's item 2
end if
end repeat
end if
end handleStartTag
on getResult()
return _result
end getResult
end script
end makeReceiver

set html to "
<html>
<head><title>Hello World</title></head>
<body>
<ul id='navbar'>
<li><a href='/index.html'>Home</a></li>
<li><a href='/products.html'>Products</a></li>
<li><a href='/about.html'>About</a></li>
<li><a href='/contact.html'>Contact</a></li>
</ul>
</body>
</html>"

set rec to makeReceiver()
_HTMLParser's parseHTML(html, rec)
return rec's getResult()
--> {"/index.html", "/products.html", "/about.html", "/contact.html"}

------- END SCRIPT -------


You'll also need to install Loader <http://applemods.sourceforge.net/getstarted.html > and the EveryItem library <http://applemods.sourceforge.net/mods/Data/EveryItem.php> if you don't already have them.

HTH

has
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.


  • Prev by Date: re: Suggestions for modifying read_parse
  • Next by Date: Re: "curl -x" and do shell script
  • Previous by thread: re: Suggestions for modifying read_parse
  • Next by thread: (OFF) Curiosity & It's Relation To Cats...
  • Index(es):
    • Date
    • Thread