Re: Removing html tags
Re: Removing html tags
- Subject: Re: Removing html tags
- From: Emmanuel <email@hidden>
- Date: Mon, 28 Feb 2005 12:02:21 +0100
At 11:31 AM +0100 2/28/05, Paff wrote:
Hi all!
I want to remove all html tags from a document downloaded with curl
utility. I want to remove everything that is between <> characters
(including <> chars) so I end up with plain text. Example: if in a
document there's "<title>BOS Bank</title>" I'd like to get only "BOS
Bank"; if there's "<TD class="tabelka01" rowspan="2"
align="center">Kod</TD>" I want to get only "Kod" string etc.
However, I have no idea how to do that. I've searched macsripter.net
and tried google but with no luck.
In the simplest cases, you can do that with one regular expression.
Depending whether your file is ASCII or UTF-8 (or else) you would use
the Satimage osax (which is free) of the Smile environment (which is
free.)
Assuming your file is UTF-8, for instance, you would do:
-- untested
uchange "<[^>]+>" into "" in the_file with regexp
----------
Emmanuel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden