• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Url Access Scripting
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Url Access Scripting


  • Subject: Re: Url Access Scripting
  • From: Christopher Stone <email@hidden>
  • Date: Wed, 29 Dec 2010 08:29:17 -0600

On Dec 29, 2010, at 03:02, Wayne Melrose wrote:
I've also included a repeat to download the images within the URL you've provided as a sample, this will loop through 1 to 99 (just to make it simpler for me) and download anything that exists within that range .. 
______________________________________________________________________

I'd rather know than guess; this will pull urls to linked images from the front window in Safari:

# ==============================================================================================
# GET LINKS FROM SAFARI USING A REGULAR _expression_
# ==============================================================================================
on SAFARI_LINKS(regexStr, tagName, tagType)
set js to "function in_array (array, item) {
for (var i=0; i < array.length; i++) {
if (array[i] == item) {
return true;}}
return false;}
var a_tags = document.getElementsByTagName('" & tagName & "');
var href_array = new Array();
var reg = new RegExp(" & regexStr & ");
for (var i = a_tags.length - 1; i >= 0; i--) {
var href = "" style="color: #000000"> & tagType & ";
if (reg.test(href)) {
if (!in_array(href_array, href)) {
href_array.push(href);}}}
 href_array;"
try
tell application "Safari"
set linkList to do _javascript_ js in document 1
end tell
return linkList
on error
return false
end try
end SAFARI_LINKS

set linkList to SAFARI_LINKS("/.*\\d+.*\\.jpg/i", "a", "href")
set linkList to reverse of linkList

Open 'http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/' in Safari and run the script to see the image urls neatly listed.

There may be a more direct way of pulling the image links out of the DOM, and if anyone wants to share same I'd love to know about it.  (I've only just dabbled with _javascript_.)

You can also curl the source of a remote url and parse it, for which the Satimage.osax's regular expressions are most useful.

Curl will read a list of urls from a file, or you can use a range.  It will also use the remote file names, so you don't have to manufacture local names unless you want to.

on FINDER_DATE() # Example: 2010-11-12_040658
set formattedDate to do shell script "date \"+%Y-%m-%d_%H%M%S\""
return formattedDate
end FINDER_DATE
# ==============================================================================================
set desktopPath to POSIX path of (path to desktop)
set newImageFolder to desktopPath & "Temp_Image_Folder_" & FINDER_DATE()

set linkList to {"http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/Ferrari-Enzo-001.jpg", "http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/Ferrari-Enzo-002.jpg"}
set AppleScript's text item delimiters to " "
set linkList to linkList as string

set cmdStr to {¬
"mkdir " & quoted form of newImageFolder & "; ", ¬
"cd " & newImageFolder & "; ", ¬
"curl -f", ¬
"--remote-name-all", ¬
"--location", ¬
"--user-agent 'Opera/9.70 (Linux ppc64 ; U; en) Presto/2.2.1'", ¬
linkList}

set cmdStr to cmdStr as string
do shell script cmdStr

Changing linkList to the following is quite a bit neater:

set linkList to "http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/Ferrari-Enzo-[001-002].jpg"

This switch will let curl read input from a file:

  -K/--config <config file>

There are examples of the file format in the curl man page.

Here's a quick and dirty way to parse out linked images:

set yourURL to quoted form of "http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/"
set cmdStr to "curl -L --user-agent 'Opera/9.70 (Linux ppc64 ; U; en) Presto/2.2.1' " & yourURL & " | egrep --ignore-case --only-matching \"<[aA] [hH][rR][eE][fF][^>]+\\.[jJ][pP][gG][^>]*>\""
do shell script cmdStr

** For some reason the 'ignore-case' switch fails to work then the 'only-matching' switch is used.  Does anyone know why?  Is it a bug or a feature?

Of course you still have clean-up to do and have to attach a base url to these.  One of the nice things about using _javascript_ and the DOM is that relative links get expanded out for you and are devoid of detritus.

--
Chris
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

References: 
 >Url Access Scripting (From: Carl Anderson <email@hidden>)
 >Re: Url Access Scripting (From: Wayne Melrose <email@hidden>)
 >Re: Url Access Scripting (From: Carl Anderson <email@hidden>)
 >Re: Url Access Scripting (From: Wayne Melrose <email@hidden>)

  • Prev by Date: Re: Url Access Scripting
  • Next by Date: Re: Address Book scripting bug
  • Previous by thread: Re: Url Access Scripting
  • Next by thread: Re: Url Access Scripting
  • Index(es):
    • Date
    • Thread