Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"
Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"
- Subject: Re: CFXMLCreateStringByUnescapingEntities() bombs on "�"
- From: Quincey Morris <email@hidden>
- Date: Tue, 25 Mar 2014 13:03:05 -0700
On Mar 25, 2014, at 11:28 , Kyle Sluder <email@hidden> wrote:
> On Tue, Mar 25, 2014, at 10:49 AM, Quincey Morris wrote:
>> However, I also see this as a bug in your code, since you’re accepting
>> “random” user input as formatted text (i.e. escaped HTML) without
>> validation.
>
> Unfortunately, NSTextView lets users paste invalid UTF-16 codepoints
> directly into the NSTextStorage that backs the text view. We see this
> happen with OmniOutliner documents on occasion. Then the next time we
> try to load the document, libxml barfs on the invalid character
> entities.
“accepting … without validation” meant, in this context, setting the NSString as the value of a Core Data property.
The underlying problem is that NSString objects are (in general, AFAIK) merely sequences of UTF-16 code units, not sequences of *valid* UTF-16 code units, so that there are valid NSStrings that aren’t valid Unicode. For example, I mean, you can AFAIK append the low surrogate 0xD800 to a NSString without it throwing an exception saying it isn’t followed by a high surrogate code unit.
That difference — sequences vs valid sequences — suggests that an NSString of unknown provenance is always a suspect Unicode string. That suggests that in a properly suspicious app, no NSString should be admitted into a persistent store without having been validated. In those terms, the problem in OmniOutliner wasn’t that it was handed a buggily invalid NSTextStorage, but that it too accepted input without validation.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden