Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Using 'sed' to alter file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using 'sed' to alter file

Subject: Re: Using 'sed' to alter file
From: "Mark J. Reed" <email@hidden>
Date: Thu, 6 Apr 2006 23:05:17 -0400

The problem with sed is that it can't deal with "everything between 'A' and 'B'" if 'A' and 'B' aren't either (1) both on the same line, or (2) each on a line by itself. And you have to know whether it's (1) or (2) ahead of time and write the sed script differently for each case.

You can run Perl from AppleScript with "do shell script" and put the whole thing on a command line; I just broke it out into full form for (relative!) clarity.

As to the repetition thing, this should do the trick:

while (/ ( StringC [^$]* \( ) ( [^$]* [^0-9a-zA-Z-\)] [^\)]* ) /gmsx)
{
    my ($all, $one, $two) = ($&, $1, $2);
    (my $new = $two) =~ s/[^0-9a-zA-Z]/-/g;
    s/\Q$all\E/$one$new/sm;
}

Once upon a time, using $& anywhere in a Perl script automatically introduced a performance penalty; I'm not sure if that's still the case or not. If it's a concern, you can just put an extra set of parentheses around the whole regex (that is, just inside the slashes) and use $1, $2, $3 instead of $&, $1, $2.

The regex says "Look for and remember any string that consists of StringC followed by any amount of stuff that doesn't include an opening parenthesis, followed by an opening parenthesis. If the stuff after that but before any closing parenthesis includes anything that's not a letter, digit, or hyphen, remember it for as long as you can go without hitting a closing paren." The /g says "and the next time through, remember where you left off and pick up there looking for another match." The /m and /s make the regex behave sensibly in the face of newlines within the string. And the /x means that spaces are ignored instead of being matched literally, which let me space things out for clarity's sake.

Right after a match, a bunch of "magic" variables are set to the strings that matched parts of the pattern. In this case, $& is everything that matched, $1 is the stuff from "StringC" up to the opening parenthesis, and $2 is the stuff between the parentheses. We copy those values into other variables because the replacement on the next line is going to nuke the magic ones.

Then we make a copy of $two (the stuff between the parentheses) and then replace everything in it that's not a digit or letter with a hyphen.

Then in the original string, we replace everything that originally matched (without interpreting it as a regex, thus the \Q...\E) with the same first part (up to the opening parenthesis) followed by the modified part.

And then repeat for as long as we find more instances in the string.

--
Mark J. Reed <email@hidden>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References:
	>Re: Using 'sed' to alter file (From: Björn Lundin <email@hidden>)
	>Re: Using 'sed' to alter file (From: Mark Walsh <email@hidden>)

Prev by Date: Re: Hours of CPU operation
Next by Date: check file
Previous by thread: Re: Using 'sed' to alter file
Next by thread: Re: Using 'sed' to alter file
Index(es):
- Date
- Thread