• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Regex with Satimage
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regex with Satimage


  • Subject: Re: Regex with Satimage
  • From: has <email@hidden>
  • Date: Tue, 7 Jun 2005 19:04:44 +0100

Michael Ghilissen wrote:

>I hope to read the pairs <tile> and <description> from a web page,
>extract the text between the XML tags and set the text  to 'references'
>\2 and \6, using Satimage's Find Text with regex.
>[...]
>set theResult to find text "(<title>)(.*)(<)(.*)(<description>)(.*)(<)"
>in theText using {"Title: \\2 ", "Description: \\6"} with regexp, all
>occurrences and string result

To do it in a one-liner you need non-greedy matching, which Satimage doesn't have. If you're just looking to do a quick-n-dirty scrape, try TextCommands' [1] 'search' command:

tell application "TextCommands"
    search theText for "<title>(.*?)</title>.*?<description>(.*?)</description>" with regex
end tell

This still leaves you to clean up whitespace and decode entities (which you can also do with TextCommands), of course, and being a very dirty solution there's still all sorts of things that can go wrong with it.

If you want a more robust solution, I've wrapped the feedparser module for Python as a scriptable FBA, ParserTools [2]. Pass it a feed URL or XML data and it'll convert it to nested key-value lists that you can recursively search for the information you want.

HTH

has

[1] http://freespace.virgin.net/hamish.sanderson/index.html#textcommands
[2] http://freespace.virgin.net/hamish.sanderson/index.html#parsertools
--
http://freespace.virgin.net/hamish.sanderson/
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Regex with Satimage
      • From: Michael Ghilissen <email@hidden>
    • Re: Regex with Satimage
      • From: Emmanuel <email@hidden>
  • Prev by Date: Re: Intel = no change for scripters?
  • Next by Date: Re: Scriptability of Character Palette
  • Previous by thread: Fwd: Regex with Satimage
  • Next by thread: Re: Regex with Satimage
  • Index(es):
    • Date
    • Thread