• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Regex with Satimage
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regex with Satimage


  • Subject: Re: Regex with Satimage
  • From: Michael Ghilissen <email@hidden>
  • Date: Wed, 8 Jun 2005 03:06:39 -0400

Thanks Has,

That's exactly it; beautifully simple. I thought I had gone mad. Meanwhile, I have learned all about regex!

Where were you when I needed you?  LOL

Michael Ghilissen

On Jun 7, 2005, at 2:04 PM, has wrote:

Michael Ghilissen wrote:

I hope to read the pairs <tile> and <description> from a web page,
extract the text between the XML tags and set the text to 'references'
\2 and \6, using Satimage's Find Text with regex.
[...]
set theResult to find text "(<title>)(.*)(<)(.*)(<description>)(.*)(<)"
in theText using {"Title: \\2 ", "Description: \\6"} with regexp, all
occurrences and string result

To do it in a one-liner you need non-greedy matching, which Satimage doesn't have. If you're just looking to do a quick-n-dirty scrape, try TextCommands' [1] 'search' command:


tell application "TextCommands"
search theText for "<title>(.*?)</title>.*?<description>(.*?)</description>" with regex
end tell


This still leaves you to clean up whitespace and decode entities (which you can also do with TextCommands), of course, and being a very dirty solution there's still all sorts of things that can go wrong with it.

If you want a more robust solution, I've wrapped the feedparser module for Python as a scriptable FBA, ParserTools [2]. Pass it a feed URL or XML data and it'll convert it to nested key-value lists that you can recursively search for the information you want.

HTH

has

[1] http://freespace.virgin.net/hamish.sanderson/index.html#textcommands
[2] http://freespace.virgin.net/hamish.sanderson/index.html#parsertools
-- http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
email@hidden


This email sent to email@hidden



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • os scripting and Intel ?
      • From: Christian Vinaa <email@hidden>
References: 
 >Re: Regex with Satimage (From: has <email@hidden>)

  • Prev by Date: Re: Binary math operations (and, or, etc.)
  • Next by Date: os scripting and Intel ?
  • Previous by thread: Re: Regex with Satimage
  • Next by thread: os scripting and Intel ?
  • Index(es):
    • Date
    • Thread