• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: RegEx question
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RegEx question


  • Subject: Re: RegEx question
  • From: Christopher Nebel <email@hidden>
  • Date: Thu, 19 Feb 2004 11:39:54 -0800

On Feb 19, 2004, at 4:00 AM, Walter Ian Kaye wrote:

At 12:03p +0100 02/19/2004, Wim Melis didst inscribe upon an electronic papyrus:

I'd like to build a regular expression search (using Satimage osax) that finds all greater-than and less-than characters that are NOT part of a matching pair of HTML tags, and replace then with the appropriate html codes.

Any idea how to accomplish this with regular expressions? I'd love to avoid a complex and slow parsing script.

Less than would be easy to find: "< " (followed by a space).

While this would ensure that you don't match any tag-openers, there is no guarantee that "<" in body text will be followed by a space. No banana.

I'm not sure that you can do this in any single expression. Perl could match isolated "<"s using a negative lookahead assertion (like this: s/<(?!\/?[a-z]+[^>]*>/&lt;/g), but Satimage doesn't have those (at least, I didn't see them in a quick scan of the documentation), and I can't think of any pattern that would match all the isolated ">", since Perl only supports lookbehind patterns of a fixed width. (I can think of a pattern that would find *one*, but it wouldn't work if you tried to apply it globally -- you'd have to keep replacing from the beginning in a loop until no more turned up.)

My thought was to break the text into tag and non-tag chunks, which is not too hard, and then do a simple replacement on the non-tag chunks. Emmanuel's hide-the-tag approach would also work.


--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References: 
 >RegEx question (From: Wim Melis <email@hidden>)
 >Re: RegEx question (From: Walter Ian Kaye <email@hidden>)

  • Prev by Date: Re: Panther bug: 'missing value' for modification date??
  • Next by Date: QuarkXPress 6.1 sizing
  • Previous by thread: Re: RegEx question
  • Next by thread: Re: RegEx question
  • Index(es):
    • Date
    • Thread