CFXMLCreateStringByUnescapingEntities() bombs on "�"
CFXMLCreateStringByUnescapingEntities() bombs on "�"
- Subject: CFXMLCreateStringByUnescapingEntities() bombs on "�"
- From: Jerry Krinock <email@hidden>
- Date: Tue, 25 Mar 2014 10:04:47 -0700
When CFXMLCreateStringByUnescapingEntities is passed the string "�”, it returns a string of two unassigned Unicode characters which cause an NSLog containing it to not print, and also upsets Core Data.
// Define the problematic string
NSString* bomb1 = @"�" ;
NSLog(@"bomb1 length=%ld", (long)[bomb1 length]) ;
NSLog(@"bomb1 = '%@'", bomb1) ;
// Run it thru CFXMLCreateStringByUnescapingEntities()
NSString* bomb2 = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL, (CFStringRef)bomb1, NULL) ;
// Examine the result
NSLog(@"bomb2 length=%ld", (long)[bomb2 length]) ;
unichar char0 = [bomb2 characterAtIndex:0] ;
NSLog(@"char0 = '%c' = %x = %d", char0, char0, char0) ;
unichar char1 = [bomb2 characterAtIndex:1] ;
NSLog(@"char1 = '%c' = %x = %d", char1, char1, char1) ;
NSLog(@"bomb2 = '%@' THIS DOES NOT LOG AT ALL!!!", bomb2) ;
printf("printf bomb2: %s\n", [bomb2 UTF8String]) ;
Here is the result:
TestApp[13859:303] bomb1 length=10
TestApp[13859:303] bomb1 = '�'
TestApp[13859:303] bomb2 length=2
TestApp[13859:303] char0 = 'É' = dcc9 = 56521
TestApp[13859:303] char1 = '-' = df2d = 57133
printf bomb2: (null)
I don’t see why CFXMLCreateStringByUnescapingEntities() is even touching bomb1, because it does not end in a semicolon. There is no HTML entity in bomb1.
The two characters in bomb2, U+DCC9 and U+DF2D, are unassigned characters in the “Low Surrogates” block. Changing the number “13207494” to a slightly different value sometimes cures the problem.
The Core Data upset occurs in -[NSManagedObjectContext save:], wherein an object has bomb2 as the value of a String attribute. (Of course, that’s how I “discovered” this problem; a user managed to get a string containing bomb1 in their input xml data.) The returned error is:
Error Code: 1671
Error Domain: NSCocoaErrorDomain,
Localized Description: The operation couldn’t be completed. (Cocoa error 1671.)
NSValidationErrorKey: location (This is the name of the String attribute.)
NSValidationErrorValue:
Actually, the error viewer in my app, an NSTextView, displays the NSValidationErrorValue as a string of two identical characters that look like a square containing an upper-case letter A whose left half is blacked out. But when I try to copy and paste that into any other app, including Mail.app, I get 0 characters.
The “validation” error is *not* because the value of the ‘location’ attribute is nil. The ‘location’ attribute is optional in the data model, and often is nil in working data sets.
This seems to me like a bug in CFXMLCreateStringByUnescapingEntities(), and that the proper workaround would be to pre-flight its input value (bomb1) and take evasive action if necessary. But since the problem occurs with numbers other than 13207494, I need to know the bounds of the set of “bad” substrings. Alternatively, a not-so-good workaround would be to detect “invalid” strings in the output. I’m hoping that someone smarter than me will know an answer that does not involve brute force :|
Thanks,
Jerry Krinock
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden