Re: NSXML and >
Re: NSXML and >
- Subject: Re: NSXML and >
- From: Keith Blount <email@hidden>
- Date: Wed, 10 Feb 2010 16:48:43 -0800 (PST)
Still working on this and still getting nowhere, so another question:
Is there a way to prevent NSXMLElement converting '&' into '&' so that I can resolve character entities myself in my own NSXMLElement category -init... method?
To recap the problem, the NSXML classes change '<' into '<' and '&' into '&' (when in string value content), just as they should according to the XML specs. But they don't convert '>' into '>'. This is fine as the XML specs don't require this in most situations, but if '>' appears in the string ']]>' (when not ending CDATA) then it must be escaped - but Apple's NSXML classes don't do this, generating invalid XML that cannot be opened by NSXMLDocument in this situation.
I tried creating my own -initWithName:validStringValue: method which did some jiggery-pokery and then called -initWithXMLString:, thinking that this wouldn't do any conversion, the idea being that I could force ']]>' to appear as ']]>' myself by creating the XML string directly rather than going through -setStringValue. But no. If you try this:
NSXMLElement *element = [[NSXMLElement alloc] initWithXMLString:@"<test>></test>"];
NSLog (@"%@", element);
The output is:
<test>></test>
In other words, the NSXMLElement automatically *forces* any occurrences of '>' to become '>', no matter how you try to work around it. And this means that if the user has entered the string ']]>' and you need to encode that in XML somewhere, then the NSXML classes force you to write invalid XML that cannot be read.
I've also tried creating the element like this:
element = [[NSXMLNode alloc] initWithKind:NSXMLElementKind options:NSXMLPreserveAll];
[element setName:@"Test"];
[element setObjectValue:@">"];
But this comes out as:
<Text>&gt;</Test>
Right now I'm thinking the only way around this is to nuke any occurrences of ']]>' altogether, and just not allow this sequence to be written to file at all. It's unlikely the user will enter this string in the fields that get encoded to XML in my app, anyway, so it will probably never be an issue. But I can't count on that, and this isn't an ideal solution - I'd much rather just know that I can write valid XML by escaping necessary characters.
So, if anyone has any ideas of how to encode ']]>' as ']]>' in the string value of an NSXMLElement (without it becoming ']]&gt;"), I'd be very grateful.
I think I need to file a bug report on this, too.
Many thanks and all the best,
Keith
--- ORIGINAL MESSAGE ---
Just to follow up on this, yet again it seems that the NSXML classes are better at validating invalid XML when opening documents than when generating XML data. If you include the string "]]>" inside the stringValue of an NSXMLElement, the '>' does not get escaped as it should according to the XML specs, and when you generate XML document data including such an element and then try to read it again, NSXMLDocument will fail and report the error: "Sequence ']]>' not allowed in content". Some sample code to demonstrate the issue:
// Create an element containing some characters that should be escaped to create valid XML.
NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@" < & > ]]> "] autorelease];
// Note how the '<' and '&' get escaped, but not the '>' (even though it should do in the ']]>' sequence).
NSLog(@"%@", element);// OUTPUT: <Test> < & > ]]> </Test>
// Now create an XML doc from the element and generate the data.
NSXMLDocument*xmlDoc = [[[NSXMLDocumentalloc] initWithRootElement:element] autorelease];
NSData*data = [xmlDoc XMLDataWithOptions:NSXMLNodePrettyPrint];
// Check the doc and data:
NSLog(@"XML Doc: %@\nData: %@", xmlDoc, data);// Yep, they are non-nil, all fine.
// Now load the data we created into an XML document.
NSError *error;
xmlDoc = [[NSXMLDocumentalloc] initWithData:data options:NSXMLNodePreserveWhitespaceerror:&error];
if(xmlDoc == nil)// If it failed, try with tidy.
xmlDoc = [[NSXMLDocumentalloc] initWithData:data options:NSXMLNodePreserveWhitespaceerror:&error];
// Did it fail?
if (xmlDoc == nil)
{
// Run the error.
if (error)
[[NSAlertalertWithError:error] runModal];
// Uh-oh... The error is: "Line 2: Sequence ']]>' not allowed in content". Because the '>' should have been escaped.
}
In other words, although the NSXML classes will escape '<' and '&' correctly, they will not handle escaping '>' at all - even when it occurs in the invalid (except when terminating CDATA) sequence ']]>'. This then causes the NSXML classes to fail when re-loading the document they just created from the same data, because NSXML is more fussy about reading than writing.
One the one hand, it is (sort of) fair enough to expect the user of these classes to ensure the string values are valid XML (even if it does mean every user of these classes having to be extra careful and become very familiar with the XML specs); on the other hand, how do I go about ensuring valid XML when this is user-generated data over which I have no control, and when the NSXML classes will tidy up the ampersands in any character entities I try to escape myself?
At first, I thought I could just replace all occurrences of ">" with ">" using NSString's -stringByReplacingOccurrencesOfString:withString:, e.g.:
NSString *validXMLStr = [userStr stringByReplacingOccurrencesOfString:@">" withString:@">"];
NSXMLElement *element = [[NSXMLElement alloc] initWithName:@"Text" stringValue:validXMLStr];
Then, to restore it:
NSString *value = [element stringValue];
userStr = [value stringByReplacingOccurrencesOfString:@">" withString:@">"];
But of course, that won't work, because the ">" I place in my "fixed" string will become "&gt;" in the XML file. So, consider the user had written a string all about XML himself:
"It turns out that ']]>' needs changing to ']]>' for valid XML..."
I then swap out the '>' in this situation to '>':
"It turns out that ']]>' needs changing to ']]>' for valid XML..."
I then pass it to the stringValue of an NSXMLElement which encodes it as:
"It turns out that ']]&gt;' needs changing to ']]&gt;' for valid XML..."
Then I read it back out, get its string value, and swap all occurrences of '>' for '>', and what we get on re-opening the file is:
"It turns out that ']]>' needs changing to ']]>' for valid XML..."
i.e. Not what the user wrote. An unlikely situation, I know, but not impossible and I have to account for it.
In other words, if NSXMLElement won't escape the '>' for me in situations where it should, how do I do it myself?
Am I missing something obvious?
Many thanks and all the best,
Keith
----- Original Message ----
From: Jens Alfke <email@hidden>
To: Keith Blount <email@hidden>
Cc: glenn andreas <email@hidden>; "email@hidden" <email@hidden>
Sent: Tue, February 9, 2010 9:37:46 PM
Subject: Re: NSXML and >
On Feb 9, 2010, at 1:03 PM, Keith Blount wrote:
> Great, many thanks for the reply, and for the location of the information in the XML docs, that's very helpful. Unfortunately, it seems that the NSXML classes don't fix the '>' in the ']]>' case either, though:
> NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@"< & > ]]>"] autorelease];
> NSLog (@"%@", element);
The ">" in "]]>" only needs to be escaped when it's inside a CDATA, I believe. (Since that string marks the end of a CDATA.)
—Jens
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden