Re: NSXMLDocument unable to parse valid HTML with scripts in the body
Re: NSXMLDocument unable to parse valid HTML with scripts in the body
- Subject: Re: NSXMLDocument unable to parse valid HTML with scripts in the body
- From: "Diez B. Roggisch" <email@hidden>
- Date: Wed, 06 Feb 2008 20:43:08 +0100
Marcus S. Zarra schrieb:
Greetings List,
I have been trying to solve this issue for a while. Nothing is coming
up on the lists, google, cocoadev, etc. that is similar to the issue
that I am having.
The code to reproduce this behavior is simple:
NSError *error = nil;
NSURL *url = [NSURL
URLWithString:@"http://web.mac.com/mzarra/Test/Original.html"];
NSXMLDocument *document = [[NSXMLDocument alloc]
initWithContentsOfURL:url options:NSXMLDocumentTidyHTML error:&error];
NSAssert(error == nil, ([NSString stringWithFormat:@"Error reading file:
%@", error]));
If you run that code (the html page is safe for work), NSXMLDocument
will give an error back of:
Exception raised during posting of notification. Ignored. exception:
'Error reading file: Error Domain=NSXMLParserErrorDomain Code=23
UserInfo=0x1f1550 "Line 140: EntityRef: expecting ';'
And will return a nil NSXMLDocument. The line that it is complaining
about is:
<div class="CounterDivClass"><script type="text/javascript"
src="http://web.mac.com/i/chp/NGHitCounter.js"></script>
Which, as far as I can tell, is perfectly valid html.
I have tried every input option available for loading the document but
none of them change the error. Even more interesting, if I just
initialize an NSXMLParser prior to loading the document then the
document will load but it will mutilate the tree and actually make the
document invalid!
To duplicate this add the line:
[[[NSXMLParser alloc] initWithData:data] autorelease];
Just before the NSXMLDocument *document... line above and rerun the
test. The document will pass but a large chunk (starting after line
140) will no longer be in the document.
So the questions that I am hoping to get resolved are:
1. Why is this throwing an error?
2. How can I get past it to properly load the XML Document (preferably
without having to build my own tree).
FYI, This html code is generated by iWeb...
Thanks for any and all help, suggestions, etc.
Marcus
For me, the line in question looks like this:
<div class="CounterDivClass"><script type="text/javascript"
src="http://web.mac.com/i/chp/NGHitCounter.js"></script><script
type="text/javascript"
src="http://web.mac.com/mzarra/.Counters/856B9B8D-11A7-46F6-869D-21AC046402D3?webdav-method=propget&counter"></script><div
id="CounterDiv"><img src="http://web.mac.com/i/chp/1/spacer.gif" alt=""
/></div></div><a href="http://www.mac.com"
title="http://www.mac.com"><img src="Welcome_files/mwmac.png" alt="Made
on a Mac" style="border: none; height: 50px; left: 24px; opacity: 0.55;
position: absolute; top: 29px; width: 139px; z-index: 1; " id="id2" />
Look at the &counter in the second script-tag. You should use an &
there I guess.
Diez
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden