Re: Removing html tags
Re: Removing html tags
- Subject: Re: Removing html tags
- From: Christian Vinaa <email@hidden>
- Date: Tue, 1 Mar 2005 02:09:08 +0100
At 19:49 -0500 28/02/2005, Marc K. Myers wrote:
On Feb 28, 2005, at 6:22 PM, Christian Vinaa wrote:
At 15:19 -0500 28/02/2005, Marc K. Myers wrote:
On Feb 28, 2005, at 1:34 PM, Paff <email@hidden> wrote:
I want to remove all html tags from a document downloaded with curl
utility. I want to remove everything that is between <> characters
(including <> chars) so I end up with plain text. Example: if in a
document there's "<title>BOS Bank</title>" I'd like to get only "BOS
Bank"; if there's "<TD class="tabelka01" rowspan="2"
align="center">Kod</TD>" I want to get only "Kod" string etc.
set theText to "<tag1>this is some text</tag1>
and then there's this text followed by
<tag2>and <tag3>its</tag3> contents</tag2>"
set {od, AppleScript's text item delimiters} to ¬
{AppleScript's text item delimiters, "<"}
set theText to text items of theText
set newText to ""
set AppleScript's text item delimiters to ">"
repeat with anItem in theText
set newList to text items of anItem
if (count newList) > 1 then
set newText to newText & text item 2 of newList
end if
end repeat
set AppleScript's text item delimiters to od
newText
-->"this is some text and then there's this
text followed by and its contents"
havent tried it out but with a quick glance
it doesnt seem to take into consideration fx.
the tag
<TD class="tabelka01" rowspan="2" align="center">
only tags like </tag1>
but PageSpinner have a script that does in fact remove all tags
large and small :-))
Actually, it deals quite well with that kind of
tag. What it can't handle is text like "If x<3
and y>10, what are the solutions?" I'm not sure
how anything without artificial intelligence
could distinguish text between angle brackets
from tags.
Marc [2/28/05 7:47:51 PM]
to make my meaning more clear:
a tag like <TD class="tabelka01" rowspan="2" align="center">
that contain a " or several "s will upset the script !
(sorry) !
--
Christian Vinaa
email@hidden
...... Meanwhile, aunt Martha, having taken a tramp in the woods,
is lying in a ditch at the edge of town .........................
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden