Re: Parsing HTML
Re: Parsing HTML
- Subject: Re: Parsing HTML
- From: Allen Watson <email@hidden>
- Date: Fri, 02 Nov 2001 08:58:41 -0800
On Fri, 2 Nov 2001 12:06:15 +0000 Steve Thompson <email@hidden> wrote:
>
Under 9.1, I had an OSAX that would parse the HTML out of reams of HTML
>
for me, just returning the bits I requested. This OSAX doesn't work
>
under OS X and I just wanted to know if anyone had come across one that
>
does.
For some reason, the migration of osaxen to OS X has been extremely slow.
Check out www.osaxen.com and you'll see there is only a handful so far.
The shareware app, TextSoap, is available in beta for OS X, and it comes
with an osax that <does> include a module to <strip> HTML code. You can find
that on Versiontracker.com.
The specific tool in the package has this in the documentation:
>
HTML Text
>
This cleaner will clean up HTML text. It strips out anything between <9 and
>
>9. This can be useful if you have the HTML source, but just want the
>
contents (without starting up your browser). It also handles Ampersand escape
>
codes ( or Œ). It will remove tab characters and remove multiple
>
carriage returns.