• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: NSXMLDocument unable to parse valid HTML with scripts in the body
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSXMLDocument unable to parse valid HTML with scripts in the body


  • Subject: Re: NSXMLDocument unable to parse valid HTML with scripts in the body
  • From: "Diez B. Roggisch" <email@hidden>
  • Date: Wed, 06 Feb 2008 20:43:08 +0100

Marcus S. Zarra schrieb:
Greetings List,

I have been trying to solve this issue for a while. Nothing is coming up on the lists, google, cocoadev, etc. that is similar to the issue that I am having.

The code to reproduce this behavior is simple:

NSError *error = nil;
NSURL *url = [NSURL URLWithString:@"http://web.mac.com/mzarra/Test/Original.html";];
NSXMLDocument *document = [[NSXMLDocument alloc] initWithContentsOfURL:url options:NSXMLDocumentTidyHTML error:&error];
NSAssert(error == nil, ([NSString stringWithFormat:@"Error reading file: %@", error]));


If you run that code (the html page is safe for work), NSXMLDocument will give an error back of:

Exception raised during posting of notification. Ignored. exception: 'Error reading file: Error Domain=NSXMLParserErrorDomain Code=23 UserInfo=0x1f1550 "Line 140: EntityRef: expecting ';'

And will return a nil NSXMLDocument. The line that it is complaining about is:

<div class="CounterDivClass"><script type="text/javascript" src="http://web.mac.com/i/chp/NGHitCounter.js";></script>

Which, as far as I can tell, is perfectly valid html.

I have tried every input option available for loading the document but none of them change the error. Even more interesting, if I just initialize an NSXMLParser prior to loading the document then the document will load but it will mutilate the tree and actually make the document invalid!

To duplicate this add the line:

[[[NSXMLParser alloc] initWithData:data] autorelease];

Just before the NSXMLDocument *document... line above and rerun the test. The document will pass but a large chunk (starting after line 140) will no longer be in the document.

So the questions that I am hoping to get resolved are:

1. Why is this throwing an error?
2. How can I get past it to properly load the XML Document (preferably without having to build my own tree).


FYI, This html code is generated by iWeb...

Thanks for any and all help, suggestions, etc.

Marcus

For me, the line in question looks like this:

<div class="CounterDivClass"><script type="text/javascript" src="http://web.mac.com/i/chp/NGHitCounter.js";></script><script type="text/javascript" src="http://web.mac.com/mzarra/.Counters/856B9B8D-11A7-46F6-869D-21AC046402D3?webdav-method=propget&counter";></script><div id="CounterDiv"><img src="http://web.mac.com/i/chp/1/spacer.gif"; alt="" /></div></div><a href="http://www.mac.com"; title="http://www.mac.com";><img src="Welcome_files/mwmac.png" alt="Made on a Mac" style="border: none; height: 50px; left: 24px; opacity: 0.55; position: absolute; top: 29px; width: 139px; z-index: 1; " id="id2" />


Look at the &counter in the second script-tag. You should use an &amp; there I guess.


Diez
_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: NSXMLDocument unable to parse valid HTML with scripts in the body [SOLVED]
      • From: "Marcus S. Zarra" <email@hidden>
References: 
 >NSXMLDocument unable to parse valid HTML with scripts in the body (From: "Marcus S. Zarra" <email@hidden>)

  • Prev by Date: need to not shift row in outline view (SOLVED)
  • Next by Date: Hep with IOKit.framework - it destroyed my project.
  • Previous by thread: NSXMLDocument unable to parse valid HTML with scripts in the body
  • Next by thread: Re: NSXMLDocument unable to parse valid HTML with scripts in the body [SOLVED]
  • Index(es):
    • Date
    • Thread