• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Removing html tags
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Removing html tags


  • Subject: Re: Removing html tags
  • From: Neil Faiman <email@hidden>
  • Date: Mon, 28 Feb 2005 18:59:36 -0500

On Feb 28, 2005, at 9:59 AM, has wrote:

Getting plain text from an HTML document is one of those problems that looks simple enough on the surface but turns out to be horrendously complicated in practice. By far the best and simplest solution is to use a scriptable web browser, HTML editing/processing tool, high-quality 3rd-party library or system API that already knows how to deal with real-world HTML and can retrieve an HTML document's content in plain-text format, e.g.:

... [Safari solution deleted]

Naive approaches such as simple regexes or that crappy guidebook remove_markup() handler won't handle stuff like whitespace and character entities in a sensible fashion [1] and can easily mess up on <head> content, comments, poorly-formed markup, etc., making them far more trouble than they're worth.


Also, has's suggestion plays to AppleScript's strengths rather than its weaknesses. Trying to solve almost any non-trivial problem in AppleScript itself is a losing proposition. AS's strength is as a scripting language, not a programming language. Generally, the right question when attacking a problem from AppleScript is, "What tool do I already have on my system that knows how to the solve this problem for me, and how can I tell it to do that from AppleScript?" So, rather than trying to code up a semi-adequate HTML remover yourself in AppleScript, you find a scriptable application that knows how to do it well. The Safari suggestion is one good one. If you have BBEdit on your system, you could look into its "translate html to text" command. In any case, the idea is to take advantage of the work that someone else has already done.


Regards,

	Neil Faiman

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: Removing html tags
      • From: "John C. Welch" <email@hidden>
References: 
 >Re: Removing html tags (From: has <email@hidden>)

  • Prev by Date: Re: Removing html tags
  • Next by Date: Re: *Very* strange script / global variable behavior
  • Previous by thread: Re: Removing html tags
  • Next by thread: Re: Removing html tags
  • Index(es):
    • Date
    • Thread