Re: NSXML and >
Re: NSXML and >
- Subject: Re: NSXML and >
- From: Keith Blount <email@hidden>
- Date: Wed, 10 Feb 2010 08:20:25 -0800 (PST)
Hi,
Just to follow up on this as I'm still having problems have done some more testing and double-checked the XML specs.
Yet again it seems that the NSXML classes are better at validating invalid XML when opening documents than when generating XML data. If you include the string "]]>" inside the stringValue of an NSXMLElement, the '>' does not get escaped as it should according to the XML specs, and when you generate XML document data including such an element and then try to read it again, NSXMLDocument will fail and report the error: "Sequence ']]>' not allowed in content". Some sample code to demonstrate the issue:
// Create an element containing some characters that should be escaped to create valid XML.
NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@" < & > ]]> "] autorelease];
// Note how the '<' and '&' get escaped, but not the '>' (even though it should do in the ']]>' sequence).
NSLog(@"%@", element);// OUTPUT: <Test> < & > ]]> </Test>
// Now create an XML doc from the element and generate the data.
NSXMLDocument*xmlDoc = [[[NSXMLDocumentalloc] initWithRootElement:element] autorelease];
NSData*data = [xmlDoc XMLDataWithOptions:NSXMLNodePrettyPrint];
// Check the doc and data:
NSLog(@"XML Doc: %@\nData: %@", xmlDoc, data);// Yep, they are non-nil, all fine.
// Now load the data we created into an XML document.
NSError *error;
xmlDoc = [[NSXMLDocumentalloc] initWithData:data options:NSXMLNodePreserveWhitespaceerror:&error];
if(xmlDoc == nil)// If it failed, try with tidy.
xmlDoc = [[NSXMLDocumentalloc] initWithData:data options:NSXMLNodePreserveWhitespaceerror:&error];
// Did it fail?
if (xmlDoc == nil)
{
// Run the error.
if (error)
[[NSAlertalertWithError:error] runModal];
// Uh-oh... The error is: "Line 2: Sequence ']]>' not allowed in content". Because the '>' should have been escaped.
}
In other words, although the NSXML classes will escape '<' and '&' correctly, they will not handle escaping '>' at all - even when it occurs in the invalid (except when terminating CDATA) sequence ']]>'. This then causes the NSXML classes to fail when re-loading the document they just created from the same data, because NSXML is more fussy about reading than writing.
One the one hand, it is (sort of) fair enough to expect the user of these classes to ensure the string values are valid XML (even if it does mean every user of these classes having to be extra careful and become very familiar with the XML specs); on the other hand, how do I go about ensuring valid XML when this is user-generated data over which I have no control, and when the NSXML classes will tidy up the ampersands in any character entities I try to escape myself?
At first, I thought I could just replace all occurrences of ">" with ">" using NSString's -stringByReplacingOccurrencesOfString:withString:, e.g.:
NSString *validXMLStr = [userStr stringByReplacingOccurrencesOfString:@">" withString:@">"];
NSXMLElement *element = [[NSXMLElement alloc] initWithName:@"Text" stringValue:validXMLStr];
Then, to restore it:
NSString *value = [element stringValue];
userStr = [value stringByReplacingOccurrencesOfString:@">" withString:@">"];
But of course, that won't work, because the ">" I place in my "fixed" string will become "&gt;" in the XML file. So, consider the user had written a string all about XML himself:
"It turns out that ']]>' needs changing to ']]>' for valid XML..."
I then swap out the '>' in this situation to '>':
"It turns out that ']]>' needs changing to ']]>' for valid XML..."
I then pass it to the stringValue of an NSXMLElement which encodes it as:
"It turns out that ']]&gt;' needs changing to ']]&gt;' for valid XML..."
Then I read it back out, get its string value, and swap all occurrences of '>' for '>', and what we get on re-opening the file is:
"It turns out that ']]>' needs changing to ']]>' for valid XML..."
i.e. Not what the user wrote. An unlikely situation, I know, but not impossible and I have to account for it.
In other words, if NSXMLElement won't escape the '>' for me in situations where it should, how do I do it myself?
Am I missing something obvious? Has anybody had to handle something similar?
Many thanks and all the best,
Keith
----- Original Message ----
From: Jens Alfke <email@hidden>
To: Keith Blount <email@hidden>
Cc: glenn andreas <email@hidden>; "email@hidden" <email@hidden>
Sent: Tue, February 9, 2010 9:37:46 PM
Subject: Re: NSXML and >
On Feb 9, 2010, at 1:03 PM, Keith Blount wrote:
> Great, many thanks for the reply, and for the location of the information in the XML docs, that's very helpful. Unfortunately, it seems that the NSXML classes don't fix the '>' in the ']]>' case either, though:
> NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@"< & > ]]>"] autorelease];
> NSLog (@"%@", element);
The ">" in "]]>" only needs to be escaped when it's inside a CDATA, I believe. (Since that string marks the end of a CDATA.)
—Jens
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden