Re: Regex with Satimage
Re: Regex with Satimage
- Subject: Re: Regex with Satimage
- From: Michael Ghilissen <email@hidden>
- Date: Wed, 8 Jun 2005 03:06:39 -0400
Thanks Has,
That's exactly it; beautifully simple. I thought I had gone mad.
Meanwhile, I have learned all about regex!
Where were you when I needed you? LOL
Michael Ghilissen
On Jun 7, 2005, at 2:04 PM, has wrote:
Michael Ghilissen wrote:
I hope to read the pairs <tile> and <description> from a web page,
extract the text between the XML tags and set the text to
'references'
\2 and \6, using Satimage's Find Text with regex.
[...]
set theResult to find text
"(<title>)(.*)(<)(.*)(<description>)(.*)(<)"
in theText using {"Title: \\2 ", "Description: \\6"} with regexp, all
occurrences and string result
To do it in a one-liner you need non-greedy matching, which Satimage
doesn't have. If you're just looking to do a quick-n-dirty scrape, try
TextCommands' [1] 'search' command:
tell application "TextCommands"
search theText for
"<title>(.*?)</title>.*?<description>(.*?)</description>" with regex
end tell
This still leaves you to clean up whitespace and decode entities
(which you can also do with TextCommands), of course, and being a very
dirty solution there's still all sorts of things that can go wrong
with it.
If you want a more robust solution, I've wrapped the feedparser module
for Python as a scriptable FBA, ParserTools [2]. Pass it a feed URL or
XML data and it'll convert it to nested key-value lists that you can
recursively search for the information you want.
HTH
has
[1]
http://freespace.virgin.net/hamish.sanderson/index.html#textcommands
[2] http://freespace.virgin.net/hamish.sanderson/index.html#parsertools
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
email@hidden
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden