• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: A framework for parsing HTML?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A framework for parsing HTML?


  • Subject: Re: A framework for parsing HTML?
  • From: Michael Rothwell <email@hidden>
  • Date: Tue, 16 Nov 2004 18:33:06 -0500

LibXML (http://xmlsoft.org/) will parse HTML into a regular DOM tree, as well as write a DOM tree back out to HTML.

LibXML is included in OSX, or can be used as a framework (http://www.zveno.com/open_source/libxml2xslt.html) included in your application.

You might also want to use libXSLT (not in OSX, but available as a framework) to manipulate your HTML DOM after parsing it. You can then write an XSLT style sheet to do the work for you.

Alternately, you can walk the DOM yourself from C code (or python, or whatever) to manipulate he data before writing it back out.

Michael Rothwell
email@hidden

On Nov 15, 2004, at 7:29 PM, Mark Patterson wrote:

Hi,

I am writing an HTML parser as part of a program for doing regular changes to website. A lot of the work is to parse the html. The initial update is to convert ASCII fonts for different alphabets to unicode. To get this right it would have to understand CSS as well, including all the different ways of importing CSS files. That sounds like a lot of work, and it has obviously been done in webcore. However I haven't noticed any API that looks like the sort of thing I want: a mutable array of an object for the tag and following text. Does this exist? If so, where?

Regards

Mark

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >A framework for parsing HTML? (From: Mark Patterson <email@hidden>)

  • Prev by Date: NSTextView cursor display problems
  • Next by Date: Re: NSTextView cursor display problems
  • Previous by thread: A framework for parsing HTML?
  • Next by thread: RE: A framework for parsing HTML?
  • Index(es):
    • Date
    • Thread