Re: Removing html Tags From Text
Re: Removing html Tags From Text
- Subject: Re: Removing html Tags From Text
- From: Jeff Ganyard <email@hidden>
- Date: Sun, 28 Oct 2001 18:49:15 -0800
At 2:46 AM -0600 10/28/01, Ehsan Saffari wrote:
Hi
When trying to remove html tags from text (archived html email messages),
there may be valid "<" and ">" in the text that is not part of any tags,
so removing tags by removing everything btwn those two characters will
mangle the text.
Has anyone come up with a better logic for removing html from text?
cheers
ehsan
Unfortunately html email is rarely properly formed... web pages are
much easier to deal with, but you could put together a list of
opening tags, just the first part (i.e. "<img") and look for the
immediately following ">" then look for "</" and the following ">" -
that should be mostly effective.
Tedious to create but html is tedious in sooo many ways. <sigh>
jeff