Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"
Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"
- Subject: Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"
- From: Wim Lewis <email@hidden>
- Date: Tue, 25 Mar 2014 12:21:11 -0700
On 25 Mar 2014, at 11:12 AM, Jens Alfke wrote:
> I agree — it seems like the 32-bit equivalent of the more common mistake of accepting an input blob containing text without first checking that it’s valid UTF-8. I did that once, and after debugging the resulting file corruption bug I made this sign to stick on my monitor: http://mooseyard.com/Pictures/UntrustedUTF8.png
>
> Now, what method/function should we use to validate that an NSString actually contains valid Unicode code points?
We have this problem in a slightly different context (copy&paste and applescript can both sneak invalid strings into an app). We ended up simply looping through the string's UTF-16 content by hand and checking for bad surrogate pairs (which is what Jerry Krinock's U+DCC9 U+DF2D sequence is) as well as a handful of codepoints reserved as permanently invalid in Unicode (U+FFFE, U+FFFF, U+1FFFF, etc.) or XML (U+0000, etc.).
You're welcome to pluck OFStringContainsInvalidSequences() / OFStringRangeOfNextInvalidCodepoint() from OmniFoundation (CFString-OFExtensions), if you like. (I'm not sure if OFStringRangeOfNextInvalidCodepoint() has made it to our published repository yet.)
However, it's clearly a bug that CFXMLCreateStringByUnescapingEntities() can return an invalid string.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden