Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: HTML parsing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTML parsing

Subject: Re: HTML parsing
From: Reinhold Penner <email@hidden>
Date: Mon, 2 Sep 2002 10:57:17 -1000

Check out perl and use regular expressions (that's how I've done it in
ChimeraKnight to extract the changed items text from a fairly
complicated table). Perl and regex do that very nicely. Alternatively
or in addition, there is a nice app called TestXSLT, which handles xml
files:

> Readme for TestXSLT
>
>
> Written by Marc Liyanage <email@hidden>
>
> Description
> TestXSLT is a small tool for experimenting with XSLT in a convenient
> wayon Mac OS X. It uses the Sablotron XSL processor from
> http://www.gingerall.com/charlie/ga/xml/p_sab.xml and the Gnome
> libxslt processor from http://xmlsoft.org/XSLT/.
>
> The program takes an .xml and an .xsl file and either displays the
> result of the transformation in an output window or writes it to a
> file.
>
> You can either edit the XML and XSLT code directly in the program or
> you can load them from files on disk. Try drag and drop for the > latter.
>
> There are some examples included to get you started. The first is this
> Readme document that you are reading now. It was written and is
> maintained in XML. I have included two stylesheets which convert the
> XML input file into an RTF document and into an HTML page. Study the
> input, the stylesheets and the output files carefully to learn some
> basics. The stylesheets are commented.
>
> There are also two "bird" files which can be used as an example. Just
> open the two, click process and see what happens...
>
> Comments, feedback and feature suggestions are welcome to the address
> above.
>
> The latest version of this software is available at
> http://www.entropy.ch/software/macosx/

HTH, -Reinhold

On Monday, Sep 2, 2002, at 10:24 Pacific/Honolulu, Roger Howard wrote:

> I've begun to build fairly function-specific handlers for extracting
> values
> from discreet HTML tag attributes and I was wondering if anyone has or
> knows
> of anything a bit more generic and tested. I have two main tasks:
>
> 1) Extract data in between a given start tag and an intelligently
> identified
> end tag. For instance, feed it the position of a <P> and it will
> return all
> the data between the <P> and the next </P>
> 2) Extract values from specified tags. For instance, feed it a tag
> such as
> <meta name="FIELDNAME" content="Field data inserted here"> and return
> the
> labels and values in the name and content fields as a hash array like:
> (("name","FIELDNAME"),("content","Field data inserted here"))
>
> A bonus would be the top-down parsing of an entire HTML document into
> a tree
> of tags, attributes, and values.
>
> Given AppleScript's ignorance of HTML/XML structures, is there a
> better,
> more tested way of doing this? I'd hate to get into constant revisions
> of my
> handlers to suit additional data sets, so I'm hoping maybe there's
> instead
> either a tried-and-true Scripting Addition or a better way such as a
> shell
> tool I can trigger from AppleScript.
>
> Any suggestions?
>
> Best,
>
> Roger Howard
> _______________________________________________
> applescript-users mailing list | email@hidden
> Help/Unsubscribe/Archives:
> http://www.lists.apple.com/mailman/listinfo/applescript-users
> Do not post admin requests to the list. They will be ignored.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References:
	>HTML parsing (From: Roger Howard <email@hidden>)

Prev by Date: Re: Naming Files from List Ref
Next by Date: re: Naming Files from List Ref
Previous by thread: HTML parsing
Next by thread: Re: HTML parsing
Index(es):
- Date
- Thread