Re: HTML parsing
Re: HTML parsing
- Subject: Re: HTML parsing
- From: Deivy Petrescu <email@hidden>
- Date: Thu, 09 Jun 2016 16:32:33 -0400
> On Jun 9, 2016, at 14:33 , Stockly, Ed <email@hidden> wrote:
>
> Something like this will get you the dates. Is that what you're trying to
> extract?
>
> —————
> set AppleScript's text item delimiters to {"<A href=\"#\"onclick=\""}
> —Assumes the raw html text is in the variable "myText"
> set myText to the rest of text items of myText
> set listOfResults to {}
> repeat with thisItem in myText
> set AppleScript's text item delimiters to {"</a>"}
>
> set insideTag to text item 1 of thisItem as text
> set AppleScript's text item delimiters to {">"}
> set the end of listOfResults to the last text item of insideTag
>
> end repeat
>
>
> Return listOfResults
>
>
>
>
>
>
> On 6/9/16, 4:14 AM,
> "applescript-users-bounces+ed.stockly=email@hidden on
> behalf of William Dockery"
> <applescript-users-bounces+ed.stockly=email@hidden on
> behalf of email@hidden> wrote:
>
>>
>> Hello, I am an intermediate AppleScripter, and I am starting to use
>> JavaScript to scrape websites for my personal use. I have learned how to
>> select elements by element ID. But I have not yet learned how to select
>> a list of HTML snippets that share a certain pattern of tags or tag
>> values.
>>
William, if I understood correctly, I don’t know that I can come up with a way of retrieving the dates in AS.
However, you can log them to the console window of Safari.
The problem is that in Safari you have to "do Javascript” and this is done in Safari, and I can get the result back to ScriptEditor.
You can get it in Safari.
However, if you want to get the dates you can use AppleScript to get them for the page source.
Script 1: getting the dates in Safari console (adjust it to reflect your case)
<script>
set jscrpt to "var q= document.getElementsByTagName('a'); for (var t=0; t<q.length; t++){console.log(q[t].innerText);}"
tell application "Safari"
activate
tell document 1
do JavaScript jscrpt
end tell
end tell
</script>
Script 2: getting the dates from the source of the current Safari document (adjust it to reflect your case)
<script>
tell application “Safari” to tell document 1 to set t to source
tid(">")
set t to rest of text items of t
set l to {}
tid("<")
repeat with j in t
try
set end of l to date (text item 1 of j)
end try
end repeat
l
on tid(x)
set AppleScript's text item delimiters to x
end tid
</script>
Deivy Petrescu
email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden