Re: Fastest/smallest search+replace
Re: Fastest/smallest search+replace
- Subject: Re: Fastest/smallest search+replace
- From: Ali Ozer <email@hidden>
- Date: Mon, 14 May 2001 10:04:52 -0700
>
I am looking for the most efficient search and replace algorithm for
>
strings. It does not have to be strictly Cocoa, since I can convert my
>
NSStrings to cstrings easily. The problem is, that I need to replace
>
the \n s in a string, which is at least 150 pages long when printed
>
out. All the attempts I tried are somewhat slow and I want it to be (or
>
appear) instant.
>
>
I figured there would be no easier way than looping through every
>
character and looking for it's value (in plain c). In Cocoa I tried to
>
split an NSString and rejoin it with the replacement character in
>
between, but that's still very slow with the massive string I passed
>
(and as long as there is no better way than the loop it should do
>
pretty much the same I do in c).
Unless you are doing a strict 1-to-1 replacement of chars, I think any
attempts to mutate a string in place might be bad as it might end up
copying data around too much, so your approach (read the source string,
create a new modified string) is probably the best.
TextEdit's TextFinder.m class does its find and replace in a pretty fast
way; replacing 6000 instances in a 200K text doc is less than a second,
for instance. But that works on attributed strings, and also updates
the display. In addition, it generates a lot of autoreleased strings,
but is careful to try to keep a lid on the memory usage. Below is a
simpler version which works on NSStrings; according to my measurements
it can replace 10000 instances in a 150 page doc in 0.15 sec. A 1000
page doc (60,000 lines) takes 1 sec. I think a faster version can be
achieved by avoiding the autoreleasing --- copy the chars from source
to destination yourself, either using getCharacters: with a stack sized
buffer (say 1000 chars), or using the CFInlineBuffer functions from
CFString.h to read chars one at a time (but efficiently).
Ali
#import <Foundation/Foundation.h>
NSString *replace(NSString *textContents, NSString *targetString,
NSString *replaceString) {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSMutableString *temp = [[NSMutableString alloc] init];
NSRange replaceRange = NSMakeRange(0, [textContents length]);
NSRange rangeInOriginalString = replaceRange; /* Range in the
original string where we do the searches */
int replaced = 0;
// The following loop can execute an unlimited number of times, and
it could have autorelease activity.
// To keep things under control, we use a pool, but to be a bit
efficient, instead of emptying everytime through
// the loop, we do it every so often. We can only do this as long as
autoreleased items are not supposed to
// survive between the invocations of the pool!
while (1) {
NSRange rangeToCopy;
NSRange foundRange = [textContents rangeOfString:targetString
options:0 range:rangeInOriginalString];
// Because we computed the tightest range above, foundRange
should always be valid.
if (foundRange.length == 0) break;
rangeToCopy = NSMakeRange(rangeInOriginalString.location,
foundRange.location - rangeInOriginalString.location);
[temp appendString:[textContents
substringWithRange:rangeToCopy]];
[temp appendString:replaceString];
rangeInOriginalString.length -= NSMaxRange(foundRange) -
rangeInOriginalString.location;
rangeInOriginalString.location = NSMaxRange(foundRange);
replaced++;
if (replaced % 100 == 0) { // Refresh the pool... See warning
above!
[pool release];
pool = [[NSAutoreleasePool alloc] init];
}
}
if (rangeInOriginalString.length > 0) [temp
appendString:[textContents substringWithRange:rangeInOriginalString]];
[pool release];
return [temp autorelease];
}