• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Regex -- TextCommands vs. Satimage
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regex -- TextCommands vs. Satimage


  • Subject: Regex -- TextCommands vs. Satimage
  • From: "John R." <email@hidden>
  • Date: Mon, 21 Nov 2005 14:16:17 -0500

Michael Ghilissen wrote:
I can't resolve this. I need to extract the text between ">Cover<"
and the first "<br><br>" in a text string that contains several
"<br><br>".

I understand that has's TextCommands can do "lazy" regex searches, whereas Satimage can not. As a (very appreciative!) user of these routines (with no axe to grind...), I wanted to compare the two.


First, it seems that Michael's original problem can NOT be solved with Satimage, if the trailing string, "<br><br>" MUST be more than a single character "<". Does anyone disagree?

Using Satimage has forced me to think of ways to convert lazy regex, which is fast and easy for humans to dream up, into greedy regex, which is ugly but does the job. Michael's simple example can indeed be solved with a harder-to-visualize greedy regex: ">cover<([^<]*)". However, I suspect that does NOT solve Michael's actual application need.

I suspect that Satimage made a design decision NOT to support lazy searches in favor of speed. Here is a website reference where I learned (what little I know) about why "lazy" is much slower than "greedy". My intuition was the opposite, at first. (see slide #27):
http://perl.plover.com/yak/regex/


So, I tested speed with a "lazy" regex search, using a single trailing character: "<a (.*?)>" or "<a ([^>]*)>". It will find matches on most websites, which have a lot of extraneous text for a good speed comparison.

Below is the testing code, with results from looking at a Google search result:
There may be problems with my test:
(1) Satimage is a scripting addition, while TextCommands is a background application. I suspect scripting additions to be faster, but also more intrusive, with reserved words, etc. I actually think this is NOT really a test problem because users must choose between fast and convenient, including ALL of the trade-offs involved, assuming scripting additions actually do have a speed advantage.
(2) Not doing these sorts of tests much, I don't know how to get ticks: only whole seconds via the current date.


Results, with variations:
(a) Satimage 2x faster in than TextCommands as tested exactly as below: 10 vs. 4
=> Satimage beats TextCommands in the basic test.
(b) Satimage 1.5x faster than TextCommands for the same regex "<a ([^>]*)>": 10 vs. 7
=> scripting additions have a slight speed advantage, but not much. This is a surprise to me.
(c) TextCommands "<a ([^>]*)>" same speed as TextCommands "<a (.*?) >": 10 vs. 9
=> greedy is not much faster than lazy. Also a surprise to me, but maybe an implementation issue...


------------------------------
-- Handler for Looping and Timing
------------------------------
on TimeThis(myScript, myHTML)
set x to current date
repeat 100 times
myScript's DoThis(myHTML)
end repeat
set x2 to current date
return (x2 - x)
end TimeThis
------------------------------
-- Script for Single "lazy" search using TextCommands
------------------------------
script UsingTextCommands
on DoThis(myHTML)
tell application "TextCommands"
return (first item of (search myHTML for "<a (.*?)>" expanding to "\\1" with regex))
end tell
end DoThis
end script
------------------------------
-- Script for Single "lazy" search using Satimage
------------------------------
script UsingSatimage
on DoThis(myHTML)
return (find text "<a ([^>]*)>" in myHTML using "\\1" with regexp and string result)
end DoThis
end script
------------------------------
-- "Functional-Programming" style Main Routine
------------------------------
tell application "Safari"
set myHTML to source of document 1
end tell
set TextCommandsResult to my TimeThis(UsingTextCommands, myHTML)
set SatimageResult to my TimeThis(UsingSatimage, myHTML)
return "TextCommandsResult: " & TextCommandsResult & ", SatimageResult: " & SatimageResult


--> Result: "TextCommandsResult: 10,  SatimageResult: 4"



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Prev by Date: Re: Check For Open Application
  • Next by Date: Re: Regex -- TextCommands vs. Satimage
  • Previous by thread: Re: Compare files for 'find duplicates' script
  • Next by thread: Re: Regex -- TextCommands vs. Satimage
  • Index(es):
    • Date
    • Thread