• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Removing html tags
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Removing html tags


  • Subject: Re: Removing html tags
  • From: Emmanuel <email@hidden>
  • Date: Mon, 28 Feb 2005 12:02:21 +0100

At 11:31 AM +0100 2/28/05, Paff wrote:
Hi all!

I want to remove all html tags from a document downloaded with curl utility. I want to remove everything that is between <> characters (including <> chars) so I end up with plain text. Example: if in a document there's "<title>BOS Bank</title>" I'd like to get only "BOS Bank"; if there's "<TD class="tabelka01" rowspan="2" align="center">Kod</TD>" I want to get only "Kod" string etc.

However, I have no idea how to do that. I've searched macsripter.net and tried google but with no luck.

In the simplest cases, you can do that with one regular expression. Depending whether your file is ASCII or UTF-8 (or else) you would use the Satimage osax (which is free) of the Smile environment (which is free.)


Assuming your file is UTF-8, for instance, you would do:

-- untested
uchange "<[^>]+>" into "" in the_file with regexp
----------

Emmanuel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Removing html tags (From: Paff <email@hidden>)

  • Prev by Date: Removing html tags
  • Next by Date: Re: Called script won't store property
  • Previous by thread: Removing html tags
  • Next by thread: Re: Removing html tags
  • Index(es):
    • Date
    • Thread