Re: HTML parsing
Re: HTML parsing
- Subject: Re: HTML parsing
- From: Reinhold Penner <email@hidden>
- Date: Mon, 2 Sep 2002 10:57:17 -1000
Check out perl and use regular expressions (that's how I've done it in
ChimeraKnight to extract the changed items text from a fairly
complicated table). Perl and regex do that very nicely. Alternatively
or in addition, there is a nice app called TestXSLT, which handles xml
files:
>
Readme for TestXSLT
>
>
>
Written by Marc Liyanage <email@hidden>
>
>
Description
>
TestXSLT is a small tool for experimenting with XSLT in a convenient
>
wayon Mac OS X. It uses the Sablotron XSL processor from
>
http://www.gingerall.com/charlie/ga/xml/p_sab.xml and the Gnome
>
libxslt processor from http://xmlsoft.org/XSLT/.
>
>
The program takes an .xml and an .xsl file and either displays the
>
result of the transformation in an output window or writes it to a
>
file.
>
>
You can either edit the XML and XSLT code directly in the program or
>
you can load them from files on disk. Try drag and drop for the > latter.
>
>
There are some examples included to get you started. The first is this
>
Readme document that you are reading now. It was written and is
>
maintained in XML. I have included two stylesheets which convert the
>
XML input file into an RTF document and into an HTML page. Study the
>
input, the stylesheets and the output files carefully to learn some
>
basics. The stylesheets are commented.
>
>
There are also two "bird" files which can be used as an example. Just
>
open the two, click process and see what happens...
>
>
Comments, feedback and feature suggestions are welcome to the address
>
above.
>
>
The latest version of this software is available at
>
http://www.entropy.ch/software/macosx/
HTH, -Reinhold
On Monday, Sep 2, 2002, at 10:24 Pacific/Honolulu, Roger Howard wrote:
>
I've begun to build fairly function-specific handlers for extracting
>
values
>
from discreet HTML tag attributes and I was wondering if anyone has or
>
knows
>
of anything a bit more generic and tested. I have two main tasks:
>
>
1) Extract data in between a given start tag and an intelligently
>
identified
>
end tag. For instance, feed it the position of a <P> and it will
>
return all
>
the data between the <P> and the next </P>
>
2) Extract values from specified tags. For instance, feed it a tag
>
such as
>
<meta name="FIELDNAME" content="Field data inserted here"> and return
>
the
>
labels and values in the name and content fields as a hash array like:
>
(("name","FIELDNAME"),("content","Field data inserted here"))
>
>
A bonus would be the top-down parsing of an entire HTML document into
>
a tree
>
of tags, attributes, and values.
>
>
Given AppleScript's ignorance of HTML/XML structures, is there a
>
better,
>
more tested way of doing this? I'd hate to get into constant revisions
>
of my
>
handlers to suit additional data sets, so I'm hoping maybe there's
>
instead
>
either a tried-and-true Scripting Addition or a better way such as a
>
shell
>
tool I can trigger from AppleScript.
>
>
Any suggestions?
>
>
Best,
>
>
Roger Howard
>
_______________________________________________
>
applescript-users mailing list | email@hidden
>
Help/Unsubscribe/Archives:
>
http://www.lists.apple.com/mailman/listinfo/applescript-users
>
Do not post admin requests to the list. They will be ignored.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.
References: | |
| >HTML parsing (From: Roger Howard <email@hidden>) |