• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Removing html tags
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Removing html tags


  • Subject: Re: Removing html tags
  • From: has <email@hidden>
  • Date: Tue, 1 Mar 2005 12:01:17 +0000

Marc K. Myers wrote:

What it can't handle is text like "If x<3 and y>10, what are the solutions?"

That's invalid HTML - the < and > symbols should be escaped as &lt; and &gt; - though not an uncommon mistake. A forgiving browser will simply check the '3' against its list of known HTML element names and when it doesn't find a match it'll assume the < and > symbols are actually intended as content, not tags, and escape them itself. Your average real-world web browser is filled with code to deal with goofy, malformed and deeply broken HTML.


BTW, if anyone really is mad enough to write their own HTML parser from scratch, this will get you started:

http://applemods.sourceforge.net/mods/Internet/HTMLParser.php

Simple SAX-style parser, basically a vanilla AS port of Python's HTMLParser module and much smarter than your average naive regex or TID-based [non-]solution. To build a tag stripper you'll need to provide your own HTML entity decoding and whitespace handling, plus some sort of state machine to make sense of various significant tags (mostly block-level tags like <head>, <title>, <p>, <li>, <hr>, etc. and the odd inline one like <br>). All quite doable - I once wrote a very basic pretty-printed plain-text renderer just for kicks. But it requires a fair bit of knowledge of program design and HTML and writing lots and lots of code and lookup tables to pull off, so is both mind-numbingly boring and ultimately pointless when there are already third-party solutions that have solved this problem properly.

HTH

has
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Prev by Date: Re: What is this?
  • Next by Date: Re: *Very* strange script / global variable behavior
  • Previous by thread: Re: What is this?
  • Next by thread: going from eudora to Mail
  • Index(es):
    • Date
    • Thread