Re: NSXMLDocument unable to parse valid HTML with scripts in the body [SOLVED]
Re: NSXMLDocument unable to parse valid HTML with scripts in the body [SOLVED]
- Subject: Re: NSXMLDocument unable to parse valid HTML with scripts in the body [SOLVED]
- From: "Marcus S. Zarra" <email@hidden>
- Date: Wed, 6 Feb 2008 12:56:17 -0700
Diez,
Yes that was it exactly. Thank you for taking a look at it.
Marcus
On Feb 6, 2008, at 12:43 PM, Diez B. Roggisch wrote:
Marcus S. Zarra schrieb:
Greetings List,
I have been trying to solve this issue for a while. Nothing is
coming up on the lists, google, cocoadev, etc. that is similar to
the issue that I am having.
The code to reproduce this behavior is simple:
NSError *error = nil;
NSURL *url = [NSURL URLWithString:@"http://web.mac.com/mzarra/Test/Original.html
"];
NSXMLDocument *document = [[NSXMLDocument alloc]
initWithContentsOfURL:url options:NSXMLDocumentTidyHTML
error:&error];
NSAssert(error == nil, ([NSString stringWithFormat:@"Error reading
file: %@", error]));
If you run that code (the html page is safe for work),
NSXMLDocument will give an error back of:
Exception raised during posting of notification. Ignored.
exception: 'Error reading file: Error Domain=NSXMLParserErrorDomain
Code=23 UserInfo=0x1f1550 "Line 140: EntityRef: expecting ';'
And will return a nil NSXMLDocument. The line that it is
complaining about is:
<div class="CounterDivClass"><script type="text/javascript" src="http://web.mac.com/i/chp/NGHitCounter.js
"></script>
Which, as far as I can tell, is perfectly valid html.
I have tried every input option available for loading the document
but none of them change the error. Even more interesting, if I
just initialize an NSXMLParser prior to loading the document then
the document will load but it will mutilate the tree and actually
make the document invalid!
To duplicate this add the line:
[[[NSXMLParser alloc] initWithData:data] autorelease];
Just before the NSXMLDocument *document... line above and rerun the
test. The document will pass but a large chunk (starting after
line 140) will no longer be in the document.
So the questions that I am hoping to get resolved are:
1. Why is this throwing an error?
2. How can I get past it to properly load the XML Document
(preferably without having to build my own tree).
FYI, This html code is generated by iWeb...
Thanks for any and all help, suggestions, etc.
Marcus
For me, the line in question looks like this:
<div class="CounterDivClass"><script type="text/
javascript" src="http://web.mac.com/i/chp/NGHitCounter.js"></
script><script type="text/javascript" src="http://web.mac.com/mzarra/.Counters/856B9B8D-11A7-46F6-869D-21AC046402D3?webdav-method=propget&counter
"></script><div id="CounterDiv"><img src="http://web.mac.com/i/chp/1/spacer.gif
" alt="" /></div></div><a href="http://www.mac.com" title="http://www.mac.com
"><img src="Welcome_files/mwmac.png" alt="Made on a Mac"
style="border: none; height: 50px; left: 24px; opacity: 0.55;
position: absolute; top: 29px; width: 139px; z-index: 1; " id="id2" />
Look at the &counter in the second script-tag. You should use an
& there I guess.
Diez
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden