Re: RegEx question
Re: RegEx question
- Subject: Re: RegEx question
- From: Christopher Nebel <email@hidden>
- Date: Thu, 19 Feb 2004 11:39:54 -0800
On Feb 19, 2004, at 4:00 AM, Walter Ian Kaye wrote:
At 12:03p +0100 02/19/2004, Wim Melis didst inscribe upon an
electronic papyrus:
I'd like to build a regular expression search (using Satimage osax)
that finds all greater-than and less-than characters that are NOT
part of a matching pair of HTML tags, and replace then with the
appropriate html codes.
Any idea how to accomplish this with regular expressions? I'd love to
avoid a complex and slow parsing script.
Less than would be easy to find: "< " (followed by a space).
While this would ensure that you don't match any tag-openers, there is
no guarantee that "<" in body text will be followed by a space. No
banana.
I'm not sure that you can do this in any single expression. Perl could
match isolated "<"s using a negative lookahead assertion (like this:
s/<(?!\/?[a-z]+[^>]*>/</g), but Satimage doesn't have those (at
least, I didn't see them in a quick scan of the documentation), and I
can't think of any pattern that would match all the isolated ">", since
Perl only supports lookbehind patterns of a fixed width. (I can think
of a pattern that would find *one*, but it wouldn't work if you tried
to apply it globally -- you'd have to keep replacing from the beginning
in a loop until no more turned up.)
My thought was to break the text into tag and non-tag chunks, which is
not too hard, and then do a simple replacement on the non-tag chunks.
Emmanuel's hide-the-tag approach would also work.
--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.