Re: Convert HTML Text to Plain Text
Re: Convert HTML Text to Plain Text
- Subject: Re: Convert HTML Text to Plain Text
- From: Jim Underwood <email@hidden>
- Date: Thu, 30 Jun 2016 00:56:49 +0000
- Thread-topic: Convert HTML Text to Plain Text
Shane,
Thank you very much for this script.
Your ASObjC handler is 2.03 X as fast as Shell Handler, as tested using real world data from Evernote. Fantastic!
I do have one question. You posted a very similar script at
MacScripters.net, which I just happened to run across.
I'm wondering what the purpose is of the differences between it and your script here?
use
framework "Foundation"
use
framework "AppKit"
--
classes, constants, and enums used
property NSUTF8StringEncoding
: a reference to 4
property NSAttributedString
: a reference to current
application's NSAttributedString
property NSCharacterEncodingDocumentOption
: a reference to current
application's NSCharacterEncodingDocumentOption
property NSDictionary
: a reference to current
application's NSDictionary
property NSString
: a reference to current
application's NSString
set HTMLString to "Power
#2 &" & "#8211;
Lawyerin&" & "#8217;" --
HTML split for posting.
set theString to NSString's
stringWithString:HTMLString
set dataStr to theString's
dataUsingEncoding:NSUTF8StringEncoding
set options to NSDictionary's
dictionaryWithObject:NSUTF8StringEncoding forKey:(NSCharacterEncodingDocumentOption)
set attStr to NSAttributedString's
alloc()'s initWithHTML:dataStr options:options documentAttributes:(missing value)
set outputStr to attStr's
|string|()
return outputStr as text
The key difference seems to be the use of options:
set options to NSDictionary's
dictionaryWithObject:NSUTF8StringEncoding forKey:(NSCharacterEncodingDocumentOption)
set attStr to NSAttributedString's
alloc()'s initWithHTML:dataStr options:options
documentAttributes:(missing value)
May I ask, for my (and others) edification, what effect the options parameter has on the extraction of plain text from HTML text?
Also, what is the benefit of using the property statement to set the ASObjC objects?
Thanks.
Jim Underwood
aka JMichaelTX
On
17 Jun 2016, at 11:12 AM, Jim Underwood <email@hidden>
wrote:
I'm looking for a faster, better method of converting HTML text to plain text.
I'm hoping ASObjC can come to the rescue. 😄
I have an Evernote script that might process thousands of Notes, and for each Note I need the plain text.
Any ideas/suggestions?
I
don't know how much faster this will be; it uses the same process in the end:
|
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden