Re: NSXLMDocument and malformed XML
Re: NSXLMDocument and malformed XML
- Subject: Re: NSXLMDocument and malformed XML
- From: Greg Herlihy <email@hidden>
- Date: Thu, 25 May 2006 17:31:58 -0700
- Thread-topic: NSXLMDocument and malformed XML
An XML parser is not allowed to perform any type of error recovery after
detecting that the XML document it is parsing is malformed. Rather the
parser must notify the application of the error and must stop parsing the
document at that point (though the parser is allowed to search the remainder
of the document for possible, additional errors and report them as well.)
So the rule is that unless an XML document is well-formed - the application
should simply reject it. And the reason is simple: there are no standards
for interpreting broken XML. Allowing error recovery for malformed XML would
therefore lead to behavior both unpredictable and incompatible with every
one else's unpredictable and incompatible behavior in the same situation.
And in order for XML to live up to its promise as an interchangeable and
universally-understood data format, then all XML documents - in order to be
XML documents - must be well-formed from the start. Any document that looks
like XML - but isn't - should be placed in the garbage.
The state of HTML serves as a good example of what can go wrong when
applications try to accommodate malformed documents. One of contributing
factors to the original Netscape browser's early success, was that it was
very forgiving of broken HTML. And since many web sites at the time had
broken HTML, the Netscape browser - by masking the errors - appeared to be
the more capable browser. In other words, a webmaster could simply recommend
that visitors use the Netscape browser because it would show the web site
more-or-less as intended.
Exactly how Netscape Navigator rendered broken HTML was of course a behavior
unique to Navigator . And once Navigator had become the dominant browser,
other web browsers were then placed at a significant disadvantage: other
browsers had to figure out how to display broken HTML just like Navigator
did - which is a supremely difficult task of reverse-engineering a nearly
infinite number of possible HTML errors. And attempting to take the
alternate route: persuading webmasters that the HTML on their site was
broken was also a tough sell - since how it could the HTML be "broken" if
Navigator displays the page OK, and why would the webmaster want to change
it anyway - since practically everyone is using Netscape and is not affected
by errors in HTML in any case.
The XML (and XHTML) philosophy avoids this entire debacle. All XML parsers
adhere only to standard, defined behavior. An application generating or
parsing XML documents pledges not to go off on its own - and pretend to
understand a document that it does not. But it is exactly this tough-minded,
intolerant approach that really serves the user's best interests when the
entire picture is considered as a whole.
Greg
On 5/25/06 5:22 AM, "Chris Gregg" <email@hidden> wrote:
> Disclaimer: I'm still a newbie with Cocoa, and I'm slowly stumbling
> onto new ways to do things more simply.
>
> I wrote a little program that reads iWeb index.xml files (after
> unzipping them in their original index.xml.gz), and I originally wrote
> a minimalistic XML parser to get at the important bits of code I was
> looking for, from the NSString I loaded from the file.
>
> But then I stumbled onto XMLDocument, which, I thought, would make
> my life much easier by loading the XML for me. Excellent. The
> problem is that the XML in some iWeb index.xml files is malformed, and
> I'm getting the following error back from initWithContentsOfURL:
> options: error:
>
> "Line 2: Namespace prefix xsi for type on color is not defined"
>
> It would be nice if initWithContentsOfURL at least loaded in what it
> could and then I could ignore the malformed parts, but I guess that's
> not the way it works, as it just returns nil and the error.
>
> Am I back to square one, where I need to just beef up my own XML
> parser to take care of this, or can I gracefully recover from the
> error and load it anyway?
>
> Thanks!
>
> -Chris Gregg
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Cocoa-dev mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden