NSXML and invalid UTF8 characters
NSXML and invalid UTF8 characters
- Subject: NSXML and invalid UTF8 characters
- From: Keith Blount <email@hidden>
- Date: Thu, 28 Jan 2010 15:16:20 -0800 (PST)
Hello,
I am using the NSXML classes to generate and parse my own XML files. Sometimes these files store strings of text that has been brought in from other applications (for instance, there might be a plain text representation of some text the user has pasted in from Word).
In some instances I am receiving errors in NSXMLDocument's -initWithContentsOfURLPreservingWhitespace:error:, causing it to return nil with errors such as "Char 0x0 out of allowed range" or "PCDATA invalid char value 12". As I understand it, this is because XML doesn't allow certain ranges of UTF8 characters:
http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
Especially:
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
Certainly, the "PCData invalid char" error was caused by an NSFormFeedCharacter - I don't know what the "Char 0x0" character is, but it's bound to be one from a Word document that isn't allowed.
So, my question is, what is the best way for me to filter out these invalid characters from my NSString before I pass it into NSXMLElement's -initWithName:stringValue: or similar methods, to avoid creating XML documents that won't open?
This page seems useful:
http://cse-mjmcl.cse.bris.ac.uk/blog/2007/02/14/1171465494443.html
It would seem to indicate that I would need to write some code in C to compile a string without the invalid characters, and build it into an NSString, but I was wondering if there were any methods built into the AppKit that already strip these invalid XML characters? I have looked but couldn't see any. If not, if anyone could give me any pointers on using the above info to create a method that would do this, I would be very grateful. I'm self-taught so all my knowledge is high-level Cocoa and Objective-C, so I'd end up doing it all using NSString -appendString, -stringWithFormat: methods, which I know would be wrong for this as it would be too slow and requires C.
Many thanks in advance for any help anyone can give.
All the best,
Keith
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden