Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Grabbing info from a webpage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Grabbing info from a webpage

Subject: Re: Grabbing info from a webpage
From: Daniel Jalkut <email@hidden>
Date: Tue, 26 Jul 2005 09:38:02 -0400

On Jul 26, 2005, at 12:33 AM, Patrick Zittle wrote:

Hey guys,
I would like to grab some information from a web page. How can I get that information so I can present it? For example, say I wanted to find out what the featured download from the apple downloads page. How could I grab that information.

Thanks a lot in advanced!

You've got to tasks to perform:

1. Grab the entire web page. (easy)
2. Parse for the info you want. (harder)

How you parse it is going to depend on factors like whether you are OK with having a browser open and showing the action as it happens, or whether you want it to all be done quietly behind the scenes.

I often use Safari's own javascript functionality to parse pages on my behalf. This is easier or harder depending on how many "hooks" the page's author has given you. For instance, if the data you're interested in is contained by a div with a specific ID, then you can use the javascript "getElementById" function to easily locate it and grab the contents.

In the case you mention, there isn't much to go on, but you'll notice if you look at the source that the name of the featured download appears in an h2 tag, immediately after the text "Featured Download" appears alone inside an h1 tag. Using this information, I came up with this (somewhat fragile) script. What it does is ask Safari to go to the Apple downloads page, waits for it to finish loading, and then inspects the content via JavaScript. The Javascript code looks for an H1 tag with the content "Featured Download," and then assumes that the next tag will contain the application name.

The idea of "web scraping is similar whether you do it through Safari like this, or with other tools after grabbing the entire content. Since I bet you want this to be done quietly in the background, you might use curl to fetch the page content, then use whatever means at your disposal to search for the expected text within it. Here is an example that works today, at least:


-- Fetch the page contents
set myTargetURL to "http://www.apple.com/downloads/macosx/";
set myHTML to do shell script "curl " & myTargetURL

-- Reduce the size of the examined text to only the area immediately near the "Featured Download" text set fdOffset to offset of "<h1>Featured Download</h1>" in myHTML set shortText to characters fdOffset through (fdOffset + 1000) of myHTML as string

-- Locate the text of interest by getting the start and stop offsets based on -- the expected container tags set startTag to "<h2>" set startOffset to (offset of startTag in shortText) + (length of startTag) set endTag to "</h2>" set endOffset to (offset of endTag in shortText) - 1

-- Now we have the text! display dialog "The Apple Featured Download is " & characters startOffset through endOffset of shortText

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Grabbing info from a webpage (From: Patrick Zittle <email@hidden>)




Prev by Date:
Re: Script error in applications Script Menu

Next by Date:
Re: Really dumb question . . .

Previous by thread:
Re: Grabbing info from a webpage

Next by thread:
Way to get functionality of C's 'continue' or HC 'next repeat'	without using if construct?

Index(es):

Date
Thread