Re: Regex with Satimage
Re: Regex with Satimage
- Subject: Re: Regex with Satimage
- From: has <email@hidden>
- Date: Tue, 7 Jun 2005 19:04:44 +0100
Michael Ghilissen wrote:
>I hope to read the pairs <tile> and <description> from a web page,
>extract the text between the XML tags and set the text to 'references'
>\2 and \6, using Satimage's Find Text with regex.
>[...]
>set theResult to find text "(<title>)(.*)(<)(.*)(<description>)(.*)(<)"
>in theText using {"Title: \\2 ", "Description: \\6"} with regexp, all
>occurrences and string result
To do it in a one-liner you need non-greedy matching, which Satimage doesn't have. If you're just looking to do a quick-n-dirty scrape, try TextCommands' [1] 'search' command:
tell application "TextCommands"
search theText for "<title>(.*?)</title>.*?<description>(.*?)</description>" with regex
end tell
This still leaves you to clean up whitespace and decode entities (which you can also do with TextCommands), of course, and being a very dirty solution there's still all sorts of things that can go wrong with it.
If you want a more robust solution, I've wrapped the feedparser module for Python as a scriptable FBA, ParserTools [2]. Pass it a feed URL or XML data and it'll convert it to nested key-value lists that you can recursively search for the information you want.
HTH
has
[1] http://freespace.virgin.net/hamish.sanderson/index.html#textcommands
[2] http://freespace.virgin.net/hamish.sanderson/index.html#parsertools
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden