Re: NSXMLParser and character entities?
Re: NSXMLParser and character entities?
- Subject: Re: NSXMLParser and character entities?
- From: Nathan Kinsinger <email@hidden>
- Date: Sun, 14 Sep 2008 02:45:41 -0600
On Sep 12, 2008, at 3:56 PM, Kai wrote:
When NSXMLParser hits a character entity like ä (-> German
umlaut 'ä'), it sends parser:resolveExternalEntityName:systemID: to
its delegate and if this is not implemented or returns nil,
parser:parseErrorOccurred: is called with
NSXMLParserUndeclaredEntityError.
Am I supposed to resolve all these character entities myself? And if
so, what should the NSData object returned by
parser:resolveExternalEntityName:systemID: contain? Unicode? Which
Unicode encoding?
But this can’t be, can it? I must be missing something simple.
Thanks for any hints
Kai
The main problem is that entities like ä are defined by HTML and
have nothing to do with XML or NSXMLParser.
I haven't dealt with this problem myself but I was curious so I tried
a few things.
My first attempt was using NSAttributedString to convert the HTML
entity to a UTF8 string.
- (NSData *)parser:(NSXMLParser *)parser resolveExternalEntityName:
(NSString *)entityName systemID:(NSString *)systemID
{
NSAttributedString *entityString = [[[NSAttributedString alloc]
initWithHTML:[[NSString stringWithFormat:@"&%@;", entityName]
dataUsingEncoding:NSUTF8StringEncoding] documentAttributes:NULL]
autorelease];
NSLog(@"resolved entity name: %@", [entityString string]);
return [[entityString string] dataUsingEncoding:NSUTF8StringEncoding];
}
This works, parser:foundCharacters: gets the ä but for some reason
parser:parseErrorOccurred: is still being called with the same error
you received: "Operation could not be completed.
(NSXMLParserErrorDomain error 26.)"
The parser does continue and parse the file correctly (with the ä), it
just makes it hard to tell when you have real errors. I'm really
curious as to why this doesn't work (running 10.5.4 on Intel). And the
fact that the parser keeps parsing after the error, when the
documentation says it will stop, is odd too.
Another option is to add an XHTML DocType DTD to the file and set
setShouldResolveExternalEntities: to YES (default is NO). This works
with no errors because the DTD defines the entities.
However NSXMLParser will download the DTD (over the net) every time
you parse a file. So you probably want to copy one of the DTD's (say http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd
) locally. Although I didn't try it, you could copy the entity
definitions into your own DTD to make the file smaller and parsing it
faster.
Of course if the content really is XHTML you should really be using an
HTML parser and not an XML one.
--Nathan
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden