• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Using perl to extract tagged text - again
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using perl to extract tagged text - again


  • Subject: Re: Using perl to extract tagged text - again
  • From: Eric Geoffroy <email@hidden>
  • Date: Wed, 03 May 2006 11:06:07 -0700

Ooh Thanks!

The explanation is as helpful as the solution. I hate not knowing WHY to some things.

I stumbled onto this solution, but then saw your reply and see that it will break on multiple lines.

perl -nle 'print $1 while /\<a href=\"?([^\>"]*)/g'

Too bad the lookaround is so tricky to use. It really could be useful I bet.

thx again,
Eric

On May 3, 2006, at 9:53 AM, Mark J. Reed wrote:

Grep (and your perl regex) are limited to single lines.   In general,
a <pre> and </pre> pair need not be on the same line as each other or
the text in between, in which case all such line-oriented solutions
will fail.

Grep in particular is not the tool here, since it's designed for
selecting lines, not extracting text from lines.  GNU grep does have
the -o option, which outputs only the part of the line that matches,
but that's no help here without lookaround.

However, if you know going in that it's all on a single line, you can
use sed, e.g.

sed -ne 's,<pre>\(.*\)</pre>,\1,p'

but with Perl you can eliminate the single-line requirement:

perl -ne 'BEGIN {undef $/;} print $1 while m|<pre>(.*?)</pre>|ms'

In either case, you don't need lookaround; you *match* the whole
thing, including the tags, and then just only *print* the stuff
between them.


On 5/3/06, Eric Geoffroy <email@hidden> wrote:
This is one of those things that keeps coming back to the list.
First I searched the archives. Found some solutions. Found the sub-
routines on Apple's site. I'm probably going to use the subroutine,
but for my own education, why isn't my Perl regex working?

I just want to extract any text in between these particular tags-
<pre></pre> nothing else. Seems like a simple job for grep! But woe
is me to find that grep doesn't have lookaround. Perl does. So I
tried this stuff-
(pretend the echo is really curl piped into perl)

My Attempts:
(1)
echo "<pre>sFSwacJX</pre>" | perl -nle 'print if /(?<=<pre>).+(?=</
pre>)/'

(2)
echo "<pre>sFSwacJX</pre>" | perl -nle m/(?<=<pre>).+(?=</pre>)/

(3)
Adam W. once posted something similar to this:
echo "<pre>sFSwacJX</pre>" | perl -ne 'print \"$1\" if m|<pre>(.*?)</
pre>|'"
--> Can't find string terminator '"' anywhere before EOF at -e line 1.


Note: the lookahead and lookbehind works just super in BBedit.
(?<=<pre>).+(?=</pre>)

Thx,

Eric

***********************************************************************
This email may contain confidential material. If you were not an intended recipient, please notify the sender and delete all copies. We may monitor email to and from our network.


***********************************************************************


_______________________________________________ Do not post admin requests to the list. They will be ignored. Applescript-users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
References: 
 >Using perl to extract tagged text - again (From: Eric Geoffroy <email@hidden>)
 >Re: Using perl to extract tagged text - again (From: "Mark J. Reed" <email@hidden>)

  • Prev by Date: Re: Putting a list of files in a particular order
  • Next by Date: Re: Putting a list of files in a particular order
  • Previous by thread: Re: Using perl to extract tagged text - again
  • Next by thread: Re: Using perl to extract tagged text - again
  • Index(es):
    • Date
    • Thread