• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Using perl to extract tagged text - again
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using perl to extract tagged text - again


  • Subject: Re: Using perl to extract tagged text - again
  • From: "Mark J. Reed" <email@hidden>
  • Date: Wed, 3 May 2006 12:53:56 -0400

Grep (and your perl regex) are limited to single lines.   In general,
a <pre> and </pre> pair need not be on the same line as each other or
the text in between, in which case all such line-oriented solutions
will fail.

Grep in particular is not the tool here, since it's designed for
selecting lines, not extracting text from lines.  GNU grep does have
the -o option, which outputs only the part of the line that matches,
but that's no help here without lookaround.

However, if you know going in that it's all on a single line, you can
use sed, e.g.

sed -ne 's,<pre>\(.*\)</pre>,\1,p'

but with Perl you can eliminate the single-line requirement:

perl -ne 'BEGIN {undef $/;} print $1 while m|<pre>(.*?)</pre>|ms'

In either case, you don't need lookaround; you *match* the whole
thing, including the tags, and then just only *print* the stuff
between them.


On 5/3/06, Eric Geoffroy <email@hidden> wrote:
This is one of those things that keeps coming back to the list.
First I searched the archives. Found some solutions. Found the sub-
routines on Apple's site. I'm probably going to use the subroutine,
but for my own education, why isn't my Perl regex working?

I just want to extract any text in between these particular tags-
<pre></pre> nothing else. Seems like a simple job for grep! But woe
is me to find that grep doesn't have lookaround. Perl does. So I
tried this stuff-
(pretend the echo is really curl piped into perl)

My Attempts:
(1)
echo "<pre>sFSwacJX</pre>" | perl -nle 'print if /(?<=<pre>).+(?=</
pre>)/'

(2)
echo "<pre>sFSwacJX</pre>" | perl -nle m/(?<=<pre>).+(?=</pre>)/

(3)
Adam W. once posted something similar to this:
echo "<pre>sFSwacJX</pre>" | perl -ne 'print \"$1\" if m|<pre>(.*?)</
pre>|'"
--> Can't find string terminator '"' anywhere before EOF at -e line 1.

Note: the lookahead and lookbehind works just super in BBedit.
(?<=<pre>).+(?=</pre>)

Thx,

Eric
***********************************************************************
This email may contain confidential material.
If you were not an intended recipient,
please notify the sender and delete all copies.
We may monitor email to and from our network.

***********************************************************************


_______________________________________________ Do not post admin requests to the list. They will be ignored. Applescript-users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



--
Mark J. Reed <email@hidden>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Using perl to extract tagged text - again
      • From: Eric Geoffroy <email@hidden>
    • Re: Using perl to extract tagged text - again
      • From: Eric Geoffroy <email@hidden>
References: 
 >Using perl to extract tagged text - again (From: Eric Geoffroy <email@hidden>)

  • Prev by Date: Using perl to extract tagged text - again
  • Next by Date: Re: Putting a list of files in a particular order
  • Previous by thread: Using perl to extract tagged text - again
  • Next by thread: Re: Using perl to extract tagged text - again
  • Index(es):
    • Date
    • Thread