Re: Using perl to extract tagged text - again
Re: Using perl to extract tagged text - again
- Subject: Re: Using perl to extract tagged text - again
- From: "Mark J. Reed" <email@hidden>
- Date: Wed, 3 May 2006 12:53:56 -0400
Grep (and your perl regex) are limited to single lines. In general,
a <pre> and </pre> pair need not be on the same line as each other or
the text in between, in which case all such line-oriented solutions
will fail.
Grep in particular is not the tool here, since it's designed for
selecting lines, not extracting text from lines. GNU grep does have
the -o option, which outputs only the part of the line that matches,
but that's no help here without lookaround.
However, if you know going in that it's all on a single line, you can
use sed, e.g.
sed -ne 's,<pre>\(.*\)</pre>,\1,p'
but with Perl you can eliminate the single-line requirement:
perl -ne 'BEGIN {undef $/;} print $1 while m|<pre>(.*?)</pre>|ms'
In either case, you don't need lookaround; you *match* the whole
thing, including the tags, and then just only *print* the stuff
between them.
On 5/3/06, Eric Geoffroy <email@hidden> wrote:
This is one of those things that keeps coming back to the list.
First I searched the archives. Found some solutions. Found the sub-
routines on Apple's site. I'm probably going to use the subroutine,
but for my own education, why isn't my Perl regex working?
I just want to extract any text in between these particular tags-
<pre></pre> nothing else. Seems like a simple job for grep! But woe
is me to find that grep doesn't have lookaround. Perl does. So I
tried this stuff-
(pretend the echo is really curl piped into perl)
My Attempts:
(1)
echo "<pre>sFSwacJX</pre>" | perl -nle 'print if /(?<=<pre>).+(?=</
pre>)/'
(2)
echo "<pre>sFSwacJX</pre>" | perl -nle m/(?<=<pre>).+(?=</pre>)/
(3)
Adam W. once posted something similar to this:
echo "<pre>sFSwacJX</pre>" | perl -ne 'print \"$1\" if m|<pre>(.*?)</
pre>|'"
--> Can't find string terminator '"' anywhere before EOF at -e line 1.
Note: the lookahead and lookbehind works just super in BBedit.
(?<=<pre>).+(?=</pre>)
Thx,
Eric
***********************************************************************
This email may contain confidential material.
If you were not an intended recipient,
please notify the sender and delete all copies.
We may monitor email to and from our network.
***********************************************************************
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
--
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden