Re: Help scripting form IE...
Re: Help scripting form IE...
- Subject: Re: Help scripting form IE...
- From: "Daniel A. Shockley" <email@hidden>
- Date: Mon, 6 Jan 2003 10:59:27 -0500
on 04/01/2003 07:40, Allen Evans at email@hidden wrote:
> Need help making a script to cut and paste out of HTML in IE to text in
TextEdit for update through Pod2Go to my iPod.
Here are what we have figured out so far in the MacAddict Forums (
http://www.macaddict.com/phpBB2/viewtopic.php?t=3047 ):
> ...
Matthew Smith wrote:
I don't think it is possible with Internet Explorer.
It is possible with Internet Explorer, just not ideal. You can
control almost anything Internet Explorer does by sending "do script"
javascript commands to it. However, if it is possible, it is best to
avoid scripting an interface.
Why not have script that downloads the web page to a file. You can then use
AppleScript to read the text in the file and extract what you want.
As Matthew suggest, downloading and then processing is best. Here's
the solution I posted on the MacAddict forums (it requires the
Satimage scripting addition, which is available for Mac OS X):
OK. Here's the way to go. Don't script Internet Explorer. It can be
done, but it is a pain to wait for the page to load. I've done this
for my clients on sites that are hard to access, but sugarbush.com is
not one of them. All you need is something that yanks the source of
the page, then reads out the part you want. No need to watch IE
launch, load the page, etc.
There is probably a way to do this all with just what OS X comes
with, but I don't have time to go through some of it. You can
download the page with built-in utilities, but to easily chop up the
HTML, you should download the free AppleScript scripting addition
Satimage
http://www.satimage.fr/software/downloads/Satimage246.dmg.gz
(control-click this link and choose "Save Link target as..." since it
doesn't seem to always type the file properly). Put the OSAX in your
~/Library/ScriptingAdditions folder, creating one if needed. Note
that ScriptingAdditions is ONE word. Also, you may want to check out
Smile, which is Satimage's free script-editor. It is much better than
Apple's, even with the update Apple made. Check out more at
http://www.satimage-software.com/. Oh, I'm not affiliated with
Satimage in any way, I just love their great free software.
So, take the following code, paste it into Script Editor (or Smile)
and save as an application, choosing "Never show startup screen" and
"Stay Open". The "Stay Open" means you can just let it sit around in
the background, and click on it in the Dock to update, rather than
waiting for it to launch. You can also leave that unchecked if you'd
rather it quit each time, or if you decide to save it as a compiled
script and put it in your Scripts folder for Apple's Script Menu menu
addition.
on run
getSnowReport() -- call as a handler, so it can run when launched,
or when reopened
end run
on reopen
getSnowReport() -- call as a handler, so it can run when launched,
or when reopened
end reopen
on getSnowReport()
-- downloads the snow report page from sugarbush.com, and saves
relevant parts into text file
-- download the page itself into a variable
-- have to pipe the output of curl into the 'strings' command, since
the page seems to return
-- a character that gives AppleScript problems. Most URLs would not
require the '|strings' part.
set snowSource to do shell script "curl
http://www.sugarbush.com/snowreport/index.htm | strings"
-- use a little handler I wrote to start with just the section you want
set snowSource to getTextBetween(snowSource, "<!-- ATOMZ STARTING
-->", "<!-- ATOMZ ENDING -->")
set snowSource to change "\\r +" into "\\r" in snowSource with
regexp -- lines starting with space
set snowSource to change " +" into " " in snowSource with regexp --
multiple spaces
set snowSource to change " " into "" in snowSource with regexp
-- HTML-entity non-breaking spaces
set snowSource to change "</b><br><br>" into return in snowSource
with regexp -- after the date
set snowSource to change ":</b>" into ":" in snowSource with regexp
-- keep colons where they are
set snowSource to change "</b>" into ":" in snowSource with regexp
-- add colons after bold table cells
set snowSource to change "<[^>]+>" into "" in snowSource with regexp
-- strip out all HTML tags
set snowSource to change "\\r\\r+" into "\\r" in snowSource with
regexp -- turn multiple returns into one
set snowSource to change "\\r +" into "\\r" in snowSource with
regexp -- turn returns
set snowSource to change ":\\r" into ": " in snowSource with regexp
-- remove returns after colons
set snowSource to change "\\r\\r+" into "\\r" in snowSource with
regexp -- AGAIN, turn multiple returns into one
try
set snowHandle to open for access file (((path to desktop) as
string) & "snowbush.txt") with write permission
write snowSource to snowHandle starting at 0 -- write over existing file
close access snowHandle
on error errMsg number errNum
try
close access snowHandle -- if an error, make sure it's closed
end try
error errMsg number errNum -- tell us what the error was
end try
end getSnowReport
on getTextBetween(sourceText, beforeText, afterText)
-- version 1.1, Daniel A. Shockley,
http://www.danshockley.com
-- gets the text between the first occurrences of beforeText and
afterText in sourceText
try
set oldDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to the beforeText
set the prefixRemoved to text item 2 of sourceText
set AppleScript's text item delimiters to afterText
set the finalResult to text item 1 of prefixRemoved
set AppleScript's text item delimiters to oldDelims
return finalResult
on error errMsg number errNum
set AppleScript's text item delimiters to {""}
return "" -- return nothing if the surrounding text is not found
end try
end getTextBetween
--
----
Daniel A. Shockley
email@hidden
email@hidden
http://www.danshockley.com
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.