Re: A framework for parsing HTML?
Re: A framework for parsing HTML?
- Subject: Re: A framework for parsing HTML?
- From: Michael Rothwell <email@hidden>
- Date: Tue, 16 Nov 2004 18:33:06 -0500
LibXML (http://xmlsoft.org/) will parse HTML into a regular DOM tree,
as well as write a DOM tree back out to HTML.
LibXML is included in OSX, or can be used as a framework
(http://www.zveno.com/open_source/libxml2xslt.html) included in your
application.
You might also want to use libXSLT (not in OSX, but available as a
framework) to manipulate your HTML DOM after parsing it. You can then
write an XSLT style sheet to do the work for you.
Alternately, you can walk the DOM yourself from C code (or python, or
whatever) to manipulate he data before writing it back out.
Michael Rothwell
email@hidden
On Nov 15, 2004, at 7:29 PM, Mark Patterson wrote:
Hi,
I am writing an HTML parser as part of a program for doing regular
changes to website. A lot of the work is to parse the html. The initial
update is to convert ASCII fonts for different alphabets to unicode. To
get this right it would have to understand CSS as well, including all
the different ways of importing CSS files. That sounds like a lot of
work, and it has obviously been done in webcore. However I haven't
noticed any API that looks like the sort of thing I want: a mutable
array of an object for the tag and following text. Does this exist? If
so, where?
Regards
Mark
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden