Re: RegexkitLite - Possible bug?
Re: RegexkitLite - Possible bug?
- Subject: Re: RegexkitLite - Possible bug?
- From: John Engelhart <email@hidden>
- Date: Sun, 17 Jan 2010 20:03:53 -0500
On Sun, Jan 17, 2010 at 4:15 PM, K.Darcy Otto <email@hidden> wrote:
> I've been working with RegexkitLite, and I'm wondering whether someone else
> who has RegexkitLite can reproduce this problem, or spot what I'm doing
> wrong:
>
> NSString *originalString =
> @"IMUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU";
>
> // Using the built-in "range:" option
> NSString *firstTry = [originalString stringByReplacingOccurrencesOfRegex:@"M(.*)"
> withString:@"M$1$1" range:NSMakeRange(1,57)];
> NSLog(@"firstTry result: %@",firstTry);
>
> // Using "substringWithRange:" first
> NSString *cutOriginalString = [originalString
> substringWithRange:NSMakeRange(1, 57)];
> NSString *secondTry = [cutOriginalString
> stringByReplacingOccurrencesOfRegex:@"M(.*)" withString:@"M$1$1"];
> NSLog(@"secondTry result: %@",secondTry);
>
> Output:
>
> firstTry result: (null)
> secondTry result:
> MUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU
>
> I contend that the results of firstTry and secondTry should be the same.
> What am I missing? Thanks.
>
If something isn't working quite right, it's often a good idea to to get the
NSError object if the API supports it. In this case:
NSError *error = NULL;
NSString *firstTry = [originalString
stringByReplacingOccurrencesOfRegex:@"M(.*)"
withString:@"M$1$1" options:RKLNoOptions range:NSMakeRange(1,57)
error:&error];
NSLog(@"firstTry result: %@",firstTry);
NSLog(@"error: %@", error);
NSLog(@"error: %@", [error userInfo]);
2010-01-17 19:04:40.513 list_bug[73048:a0f] firstTry result: (null)
2010-01-17 19:04:40.513 list_bug[73048:a0f] error: Error
Domain=RKLICURegexErrorDomain Code=-124 UserInfo=0x409850 "The ICU library
returned an unexpected error code."
2010-01-17 19:04:40.514 list_bug[73048:a0f] error: {
NSLocalizedDescription = "The ICU library returned an unexpected error
code.";
NSLocalizedFailureReason = "The error U_STRING_NOT_TERMINATED_WARNING
occurred.";
RKLICURegexErrorCode = "-124";
RKLICURegexErrorName = "U_STRING_NOT_TERMINATED_WARNING";
RKLICURegexRegex = "M(.*)";
RKLICURegexRegexOptions = 0;
}
The ICU functions that perform the search and replace functionality have
been a big source of bugs in RegexKitLite. The ICU functions have a
particularly error prone and brittle calling syntax. Since you're
performing a search and replace, the size of the replaced string can be
quite a bit larger than the original string. Your example replacement
string essentially doubles the size of the final, replaced string.
RegexKitLite makes an "educated guess" at what the size of the final,
replaced string is going to be. The ICU library fills up whatever buffer
you happen to give it, but when it runs out of space, it returns an error
code "U_BUFFER_OVERFLOW_ERROR". Now, it's "supposed" to allow you to keep
calling the "append and replace" string functions so it can tally up the
exact size of the buffer that you would need to complete the replacement.
Naturally, there's bugs in the replacement code in at least some versions of
ICU where the first overflow error causes the append and replace functions
to stop processing because "There's an error!". The API says that you only
ever need to do "two passes" of a search and replace at most: if the first
pass had too small a buffer, you'll get the size of buffer you need, and
therefore the second run is "guaranteed" to succeed because it has
calculated the required sizes. So, RegexKitLite has workarounds to
compensate for this broken behavior. To do this, RegexKitLite needs to
detect the fact that a buffer over flow error has occurred, reset the error
status so that ICU thinks it can keep going, and rinse and repeat until ICU
says it's finished.
However, this introduces another problem: Using this technique, you can
really only return one error condition. And if you've got a buffer overflow
condition, that's your error. If a second error pops up, then what? From
past experience, these routines are pretty brittle, and trying to compensate
for them usually just leads to more problems. Therefore, I've decided to
take an extremely conservative approach and abort if things start to go
sideways. While I'm sure something thought it was a great idea to "warn"
you about "your string is not terminated", in reality it does nothing but
complicate things.. especially because it's no longer unambiguous if
the U_STRING_NOT_TERMINATED_WARNING warning/error is masking an
underlying U_BUFFER_OVERFLOW_ERROR because the buffer over flow error code
is completely buggy.
In this particular case, it looks like you happened to create a replacement
string that is exactly the same size as the size RegexKitLite choose for its
temporary buffer.
A possible work around is to use the pre-4.0 version that's in SVN. It has
support for the new Blocks syntax and you can use it to do a search and
replace like so:
NSString *replacedString = [originalString
stringByReplacingOccurrencesOfRegex:@"M(.*)" options:RKLNoOptions
inRange:NSMakeRange(1,57) error:&error
enumerationOptions:RKLRegexEnumerationNoOptions usingBlock:^NSString
*(NSInteger captureCount, NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount], volatile BOOL * const stop){
return([NSString stringWithFormat:@"M%@%@", capturedStrings[1],
capturedStrings[1]]);
}];
NSLog(@"replacedString: %@", replacedString);
2010-01-17 19:59:31.631 list_bug[73256:a0f] firstTry result: (null)
2010-01-17 20:02:17.374 list_bug[73294:a0f] replacedString:
MUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU
2010-01-17 19:59:31.635 list_bug[73256:a0f] secondTry result:
MUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden