Re: Extracting text from html
Re: Extracting text from html
- Subject: Re: Extracting text from html
- From: Johan Jacobson <email@hidden>
- Date: Tue, 03 Sep 2002 13:24:03 +0200
Hi! I found this on a helpfile from Apple. It works really well.
Regards!
Johan Jacobson
on read_parse(this_file, opening_tag, closing_tag, contents_only)
try
set this_file to this_file as text
set this_file to open for access file this_file
set the combined_results to ""
set the open_tag to ""
repeat
read this_file before "<" -- start of a tag
set this_tag to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if this_tag does not start with "<" then ,
set this_tag to ("<" & this_tag) as string
-- EXAMINE THE TAG
if this_tag begins with the opening_tag then
--store the complete tag, not just the search string
set the open_tag to this_tag
-- check for single tag indicator
if the closing_tag is "" then
if the combined_results is "" then
set the combined_results to the combined_results & ,
the open_tag
else
set the combined_results to the combined_results & ,
return & the open_tag
end if
else
-- reset the text buffer
set the text_buffer to ""
-- extract the contents between the open and close tags
repeat
set the text_buffer to the text_buffer & ,
(read this_file before "<") -- start of a tag
set the tag_buffer to read this_file until ">" --
end of a tag
-- to make up for a bug in the "read before" command
if the tag_buffer does not start with "<" then ,
set the tag_buffer to ("<" & the tag_buffer) as
string
-- check for the closing tag
if the tag_buffer is the closing_tag then
if contents_only is false then
set the text_buffer to the open_tag & ,
the text_buffer & the tag_buffer
end if
if the combined_results is "" then
set the combined_results to the
combined_results & ,
the text_buffer
else
set the combined_results to the
combined_results & ,
return & the text_buffer
end if
exit repeat
else
set the text_buffer to the text_buffer & the
tag_buffer
end if
end repeat
end if
end if
end repeat
close access this_file
on error error_msg number error_num
try
close access this_file
end try
if error_num is not -39 then return false
end try
return the combined_results
end read_parse
-- Send file to parser the file is xmlCom
set quoteTEL to my read_parse(xmlCom, "<B>", "</B>", true)
Den 02-09-03 13.13 skrev Gvran Ehn:
>
Can someone please illustrate on how to grab text between some specific html
>
tags in a html file, like:
>
<B>I want to copy this string.</B>
>
(Its OK if tags themself follows, thus I can filter them later on in my
>
script).
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.