• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Convert HTML to plain text
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Convert HTML to plain text


  • Subject: Re: Convert HTML to plain text
  • From: Lorenzo Thurman <email@hidden>
  • Date: Wed, 18 Apr 2007 17:05:55 -0500

Try this:
// I snarfed this snippet from: http://cocoa.karelia.com/ Foundation_Categories/NSString/_Flatten__a_string_.m
if (! [html isEqualToString:@""]) // if empty string, don't do this! You get junk.
{

int encoding = ([html length] > 3) ? NSUnicodeStringEncoding : NSMacOSRomanStringEncoding;
NSAttributedString *attrString;
NSData *theData = [html dataUsingEncoding:encoding];
if (nil != theData) // this returned nil once; not sure why; so handle this case.
{
NSDictionary *encodingDict = [NSDictionary dictionaryWithObject: [NSNumber numberWithInt:encoding] forKey:@"CharacterEncoding"];
attrString = [[NSAttributedString alloc] initWithHTML:theData documentAttributes:&encodingDict];
[self setCleanedString:[attrString string]]; // keep only this
[attrString release];// don't do autorelease since this is so deep down.
}
}


On Apr 18, 2007, at 12:00 p, email@hidden wrote:


NSError *theError = nil;
NSXMLDocument *theDoc = [[[NSXMLDocument alloc] initWithXMLString:yourString
options:NSXMLDocumentTidyHTML error:&theError] autorelease];


NSString *theXSLT = @"<?xml version='1.0' encoding='utf-8'?>\
<xsl:stylesheet version='1.0' \
xmlns:xsl='http://www.w3.org/1999/XSL/Transform' \
xmlns:xhtml='http://www.w3.org/1999/xhtml'>\
<xsl:output method='text'/>\
<xsl:template match='xhtml:head'></xsl:template>\
<xsl:template match='xhtml:script'></xsl:template>\
</xsl:stylesheet>";

NSData *theData = [theDoc objectByApplyingXSLTString:theXSLT arguments:nil
error:&theError];
NSString *theString = [[[NSString alloc] initWithData:theData
encoding:NSUTF8StringEncoding] autorelease];


printf( "%s", [theString UTF8String]);


Jim

On 4/18/07 3:43 AM, "David Brennan" <email@hidden> wrote:

Hi,

I'm working on a feed reader. Some RSS items have a description that
contains HTML. From what I can see, the HTML that comes in these RSS
items is not a full HTML page but only the HTML between the <body>
tags.

I need these description's in plain text. Some are plain text and some
are HTML. How can I convert an NSString that contains HTML to just the
text.


Kind regards,
Dave.

"My Break-Dancing days are over, but there's always the Funky Chicken" --The Full Monty


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Prev by Date: Re: A bug with NSWindow's convertBaseToScreen: method ?
  • Next by Date: mDNS Service Discovery and Text Record access
  • Previous by thread: Re: Convert HTML to plain text
  • Next by thread: A bug with NSWindow's convertBaseToScreen: method ?
  • Index(es):
    • Date
    • Thread