Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Removing html tags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Removing html tags

Subject: Re: Removing html tags
From: Emmanuel <email@hidden>
Date: Mon, 28 Feb 2005 12:02:21 +0100

At 11:31 AM +0100 2/28/05, Paff wrote:

Hi all!
I want to remove all html tags from a document downloaded with curl utility. I want to remove everything that is between <> characters (including <> chars) so I end up with plain text. Example: if in a document there's "<title>BOS Bank</title>" I'd like to get only "BOS Bank"; if there's "<TD class="tabelka01" rowspan="2" align="center">Kod</TD>" I want to get only "Kod" string etc.

However, I have no idea how to do that. I've searched macsripter.net and tried google but with no luck.

In the simplest cases, you can do that with one regular expression. Depending whether your file is ASCII or UTF-8 (or else) you would use the Satimage osax (which is free) of the Smile environment (which is free.)

Assuming your file is UTF-8, for instance, you would do:

-- untested
uchange "<[^>]+>" into "" in the_file with regexp
----------

Emmanuel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Removing html tags (From: Paff <email@hidden>)




Prev by Date:
Removing html tags

Next by Date:
Re: Called script won't store property

Previous by thread:
Removing html tags

Next by thread:
Re: Removing html tags

Index(es):

Date
Thread