Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: RegEx question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RegEx question

Subject: Re: RegEx question
From: Christopher Nebel <email@hidden>
Date: Thu, 19 Feb 2004 11:39:54 -0800

On Feb 19, 2004, at 4:00 AM, Walter Ian Kaye wrote:

At 12:03p +0100 02/19/2004, Wim Melis didst inscribe upon an electronic papyrus:

I'd like to build a regular expression search (using Satimage osax) that finds all greater-than and less-than characters that are NOT part of a matching pair of HTML tags, and replace then with the appropriate html codes.

Any idea how to accomplish this with regular expressions? I'd love to avoid a complex and slow parsing script.

Less than would be easy to find: "< " (followed by a space).

While this would ensure that you don't match any tag-openers, there is no guarantee that "<" in body text will be followed by a space. No banana.

I'm not sure that you can do this in any single expression. Perl could match isolated "<"s using a negative lookahead assertion (like this: s/<(?!\/?[a-z]+[^>]*>/</g), but Satimage doesn't have those (at least, I didn't see them in a quick scan of the documentation), and I can't think of any pattern that would match all the isolated ">", since Perl only supports lookbehind patterns of a fixed width. (I can think of a pattern that would find *one*, but it wouldn't work if you tried to apply it globally -- you'd have to keep replacing from the beginning in a loop until no more turned up.)

My thought was to break the text into tag and non-tag chunks, which is not too hard, and then do a simple replacement on the non-tag chunks. Emmanuel's hide-the-tag approach would also work.

--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References:
	>RegEx question (From: Wim Melis <email@hidden>)
	>Re: RegEx question (From: Walter Ian Kaye <email@hidden>)

Prev by Date: Re: Panther bug: 'missing value' for modification date??
Next by Date: QuarkXPress 6.1 sizing
Previous by thread: Re: RegEx question
Next by thread: Re: RegEx question
Index(es):
- Date
- Thread