Re: NSXML and >
Re: NSXML and >
- Subject: Re: NSXML and >
- From: Keith Blount <email@hidden>
- Date: Thu, 11 Feb 2010 16:30:21 -0800 (PST)
For the sake of the archives, and in case anyone else comes across this problem (anyone using the NSXML classes should at least be aware of it), here is the workaround I have settled on.
I have created my own NSXMLDocument category to generate NSData using -XMLDataWithOptions: that then escapes any potentially dodgy occurrences of ‘>’ by loading that data object into an NSString. Here is the category method:
@implementation NSXMLDocument (BugWorkaround)
/*
*NOTE: This method works around a bug in the Cocoa NSXML classes, which refuse to escape '>' in any situation.
*According to section 2.4 of the XML specs ( http://www.w3.org/TR/REC-xml/#syntax ), '<' and '&' must always
*be escaped, but '>' only needs escaping when it appears in the character sequence ']]>', unless it marks the
*end of a CDATA block (or a conditional section - see section 3.4). The Cocoa NSXML classes correctly escape
*'<' and '&' to '<' and '&', but *never* escape '>', even when it appears in the string ']]>' inside an
*element's -stringValue. This generates invalid XML, and NSXMLDocument will refuse to read invalid XML unless
*NSXMLDocumentTidyXML is set, but that messes up all the white space.
*This method therefore generates NSData from the NSXMLDocument, then reads that data into an NSString. It then
*looks for the ']]>' sequence. If it finds it, it looks for the '<![' sequence, which opens a conditional section
*or CDATA block. If '<![' is *not* found in the XML string, then we can safely assume that any occurrences of
*']]>' must appear only in places where it should be escaped to ']]>' in order to create valid XML. This method
*replaces such instances with an escaped version of the string.
*/
- (NSData *)XMLDataWithPrettyPrintAndExtraEscapes
{
// First, get the data.
NSData*data = [selfXMLDataWithOptions:NSXMLNodePrettyPrint];
// Then, read the resulting XML string.
NSMutableString*str = [[NSMutableStringalloc] initWithData:data encoding:NSUTF8StringEncoding];
// Does this XML contain the character sequence ']]>'?
// If not, do nothing - just return the data as-is, as the NSXML classes
// will have escaped everything that needed escaping.
NSRange range = [str rangeOfString:@"]]>"];
if (range.length == 0)
{
[str release];
return data;
}
// If it does contain these characters, though, we need to do some checks. This sequence
// of characters should only appear at the end of a CDATA or conditional sequence. If the ']]>'
// string appears anywhere else - i.e. in content - then the '>' *must* be escaped to '>'.
// We thus check to see if this XML contains any CDATA or conditional sequences by looking for
// '<![' (CDATA and conditional sections always begin '<![', e.g. '<![CDATA[...]]>'.).
NSRange conditionalRange = [str rangeOfString:@"<!["];
if (conditionalRange.length > 0)
{
// If the document contains any CDATA or conditional sequences, bail and do nothing - we'll
// just have to leave it to the user to ensure that there is no invalid ']]>' sequences within
// the content. We don't want to escape any ']]>' sequences that should not be escaped, and
// checking which ones should and shouldn't be escaped gets too complicated for our purposes.
[str release];
return data;
}
// If we got here, the XML document contains the ']]>' sequence only in places that are *not* ending CDATA
// or conditional sequences (because we know there are no such sequences in this document). This is bad XML,
// so we replace all of the occurrences of this string with the correct ']]>' escaped version, and then
// generate and return our own data object from our string.
[str replaceOccurrencesOfString:@"]]>"withString:@"]]>"options:0range:NSMakeRange(range.location, [str length]-range.location)];
data = [str dataUsingEncoding:NSUTF8StringEncoding];
[str release];
return data;
}
@end
I’ve also filed this as a bug - #7637981.
--- ORIGINAL MESSAGE ---
Still working on this and still getting nowhere, so another question:
Is there a way to prevent NSXMLElement converting '&' into '&' so that I can resolve character entities myself in my own NSXMLElement category -init... method?
To recap the problem, the NSXML classes change '<' into '<' and '&' into '&' (when in string value content), just as they should according to the XML specs. But they don't convert '>' into '>'. This is fine as the XML specs don't require this in most situations, but if '>' appears in the string ']]>' (when not ending CDATA) then it must be escaped - but Apple's NSXML classes don't do this, generating invalid XML that cannot be opened by NSXMLDocument in this situation.
I tried creating my own -initWithName:validStringValue: method which did some jiggery-pokery and then called -initWithXMLString:, thinking that this wouldn't do any conversion, the idea being that I could force ']]>' to appear as ']]>' myself by creating the XML string directly rather than going through -setStringValue. But no. If you try this:
NSXMLElement *element = [[NSXMLElement alloc] initWithXMLString:@"<test>></test>"];
NSLog (@"%@", element);
The output is:
<test>></test>
In other words, the NSXMLElement automatically *forces* any occurrences of '>' to become '>', no matter how you try to work around it. And this means that if the user has entered the string ']]>' and you need to encode that in XML somewhere, then the NSXML classes force you to write invalid XML that cannot be read.
I've also tried creating the element like this:
element = [[NSXMLNode alloc] initWithKind:NSXMLElementKind options:NSXMLPreserveAll];
[element setName:@"Test"];
[element setObjectValue:@">"];
But this comes out as:
<Text>&gt;</Test>
...
--- ORIGINAL MESSAGE ---
Just to follow up on this, yet again it seems that the NSXML classes are better at validating invalid XML when opening documents than when generating XML data. If you include the string "]]>" inside the stringValue of an NSXMLElement, the '>' does not get escaped as it should according to the XML specs, and when you generate XML document data including such an element and then try to read it again, NSXMLDocument will fail and report the error: "Sequence ']]>' not allowed in content". Some sample code to demonstrate the issue:
// Create an element containing some characters that should be escaped to create valid XML.
NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@" < & > ]]> "] autorelease];
// Note how the '<' and '&' get escaped, but not the '>' (even though it should do in the ']]>' sequence).
NSLog(@"%@", element);// OUTPUT: <Test> < & > ]]> </Test>
// Now create an XML doc from the element and generate the data.
NSXMLDocument*xmlDoc = [[[NSXMLDocumentalloc] initWithRootElement:element] autorelease];
NSData*data = [xmlDoc XMLDataWithOptions:NSXMLNodePrettyPrint];
// Check the doc and data:
NSLog(@"XML Doc: %@\nData: %@", xmlDoc, data);// Yep, they are non-nil, all fine.
// Now load the data we created into an XML document.
NSError *error;
xmlDoc = [[NSXMLDocumentalloc] initWithData:data options:NSXMLNodePreserveWhitespaceerror:&error];
if(xmlDoc == nil)// If it failed, try with tidy.
xmlDoc = [[NSXMLDocumentalloc] initWithData:data options:NSXMLNodePreserveWhitespaceerror:&error];
// Did it fail?
if (xmlDoc == nil)
{
// Run the error.
if (error)
[[NSAlertalertWithError:error] runModal];
// Uh-oh... The error is: "Line 2: Sequence ']]>' not allowed in content". Because the '>' should have been escaped.
}
In other words, although the NSXML classes will escape '<' and '&' correctly, they will not handle escaping '>' at all - even when it occurs in the invalid (except when terminating CDATA) sequence ']]>'. This then causes the NSXML classes to fail when re-loading the document they just created from the same data, because NSXML is more fussy about reading than writing.
...
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden