Re: Convert HTML to plain text
Re: Convert HTML to plain text
- Subject: Re: Convert HTML to plain text
- From: Lorenzo Thurman <email@hidden>
- Date: Wed, 18 Apr 2007 17:05:55 -0500
Try this:
// I snarfed this snippet from: http://cocoa.karelia.com/
Foundation_Categories/NSString/_Flatten__a_string_.m
if (! [html isEqualToString:@""]) // if empty string, don't do
this! You get junk.
{
int encoding = ([html length] > 3) ? NSUnicodeStringEncoding :
NSMacOSRomanStringEncoding;
NSAttributedString *attrString;
NSData *theData = [html dataUsingEncoding:encoding];
if (nil != theData) // this returned nil once; not sure why; so
handle this case.
{
NSDictionary *encodingDict = [NSDictionary dictionaryWithObject:
[NSNumber numberWithInt:encoding] forKey:@"CharacterEncoding"];
attrString = [[NSAttributedString alloc] initWithHTML:theData
documentAttributes:&encodingDict];
[self setCleanedString:[attrString string]]; // keep only this
[attrString release];// don't do autorelease since this is so deep
down.
}
}
On Apr 18, 2007, at 12:00 p, email@hidden wrote:
NSError *theError = nil;
NSXMLDocument *theDoc = [[[NSXMLDocument alloc]
initWithXMLString:yourString
options:NSXMLDocumentTidyHTML error:&theError] autorelease];
NSString *theXSLT = @"<?xml version='1.0' encoding='utf-8'?>\
<xsl:stylesheet version='1.0' \
xmlns:xsl='http://www.w3.org/1999/XSL/Transform' \
xmlns:xhtml='http://www.w3.org/1999/xhtml'>\
<xsl:output method='text'/>\
<xsl:template match='xhtml:head'></xsl:template>\
<xsl:template match='xhtml:script'></xsl:template>\
</xsl:stylesheet>";
NSData *theData = [theDoc objectByApplyingXSLTString:theXSLT
arguments:nil
error:&theError];
NSString *theString = [[[NSString alloc] initWithData:theData
encoding:NSUTF8StringEncoding] autorelease];
printf( "%s", [theString UTF8String]);
Jim
On 4/18/07 3:43 AM, "David Brennan" <email@hidden> wrote:
Hi,
I'm working on a feed reader. Some RSS items have a description that
contains HTML. From what I can see, the HTML that comes in these RSS
items is not a full HTML page but only the HTML between the <body>
tags.
I need these description's in plain text. Some are plain text and
some
are HTML. How can I convert an NSString that contains HTML to just
the
text.
Kind regards,
Dave.
"My Break-Dancing days are over, but there's always the Funky Chicken"
--The Full Monty
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden