Re: Repetitive Appending of Strings
Re: Repetitive Appending of Strings
- Subject: Re: Repetitive Appending of Strings
- From: John Stiles <email@hidden>
- Date: Tue, 12 Feb 2008 14:01:09 -0800
If you've got a little geekery left in you, I'd be highly curious to
know the times for doing a million push_back's into a vector<char>. I
wonder how it compares to NSMutableData.
Adam P Jenkins wrote:
I agree with you for appending millions of chars. My geekiness got
the better of my and I wrote a little test program to time how long
various methods of building a million digit numeric string 1 char at a
time took. On my computer the results were as follows:
0.42 seconds: Using NSMutableString's appendFormat:, starting with a
string with 0 capacity
0.41 seconds: Using NSMutableString's appendFormat:, starting with
capacity 1 million
0.032 seconds: Using NSMutableData appendBytes:length: took ~ 0.032
seconds
0.0039 seconds: Assigning directly into a char array
Notice that for NSMutableString appendFormat: it doesn't make much
difference whether the string is initially created with enough
capacity to hold the whole string or not, which shows that it does in
fact use a smart allocation strategy like I described below to
amortize the cost of growing the string.
The fact that there's an order of magnitude difference between using
NSMutableData appendBytes and assigning directly into an array also
show that method call cost can be noticeable when we're talking
millions of invocations a second.
On the flip side though I would note that all these times are so small
that it's really not worth worrying about this sort of thing in cases
where the number of appends isn't so high.
I've appended my test program in case you're interested. If you
create a Foundation Tool project in XCode, and paste this in as the
main method, then you should be able to build and run it. On my
machine, a MacBook Pro with 2.33 Ghz CPU, the output is:
String append with initial capacity 0 took 0.419497 seconds
String append with initial capacity 1000000 took 0.406551 seconds
NSMutableData appendBytes took 0.033865 seconds
Assigning into char array took 0.003881 seconds
On Feb 12, 2008, at 2:48 PM, Andrew Merenbach wrote:
Hi, Adam,
I understand what you (and others) are saying about premature
optimization. My feelings were only hunches, really, which is why
Shark might be a good idea. A test of the buffer idea, however --
with an NSZoneCalloc'ed buffer of unichars -- and calculating a
quotient out to a million digits, yielded what was something like a
three-second time for the actual calculation -- compared with ten
seconds when using the -appendFormat: method.
I can't imagine a case where a person would actually want more than a
thousand -- or even more than a hundred, really -- digits for their
equations, but I would like to make the program capable of this, if
only for completeness (now, there's also a practical limit, I
suppose, on how many bytes of data an NSTextView can hold, but I
don't know what that is). Using a buffer seems to speed things up
immensely -- in fact, the slowest part seems now to be the actual
updating of the text view.
Cheers,
Andrew
On Feb 12, 2008, at 10:53 AM, Adam P Jenkins wrote:
I agree that the method invocation itself is probably not a big
problem. However appending to a string repeatedly could be very
inefficient depending on how it's implemented. In the worst case,
append always makes a new internal copy of its character buffer with
just the right amount of extra space at the end to hold the appended
content, so building up a string by appending one character at a
time becomes a O(n^2) operation. I assume this is what the
original poster was worried about.
You can easily avoid this worst case behavior by using the
+stringWithCapacity: method of NSMutableString, passing in the max
length of your digit string, so that your string already has an
internal buffer preallocated to the max number of characters the
string will grow to. Then you can append one char at a time up to
the internal capacity without causing any reallocate/copies to
occur. This seems more straightforward to me than using an explicit
char array which you have to manage and null-terminate yourself.
Also, I don't know about NSMutableString specifically, but other
string classes whose internals I've examined actually use a
heuristic allocation strategy to avoid this worst case append
behavior, at the expense of possibly using more memory than
necessary. When an append is performed that causes the internal
char buffer to need to be extended, rather than just extending it
only as much as necessary to hold the new characters, they double
the size of the internal buffer. This optimizes for the common
usage of strings, where strings are built up from many small
appends. The amortized cost of building up a string from n 1
character appends then becomes O(n) rather than O(n^2).
The upshot of all this is, don't prematurely optimize. It may be
just fine to use -appendFormat: repeatedly if NSMutableString uses a
heuristic like I described above. And in any case you can use
+stringWithCapacity: to preallocate the internal buffer if you know
how big your string can grow to. So that leaves you with only
method call cost to worry about.
Adam
#import <Foundation/Foundation.h>
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [NSAutoreleasePool new];
NSDate *tic, *toc;
NSMutableString *theString;
int niters = 1000000;
int i;
// start the string off at 0 capacity and let appendFormat expand it
tic = [NSDate date];
theString = [NSMutableString stringWithCapacity:0];
for (i = 0; i < niters; i++)
[theString appendFormat:@"%d", (i % 10)];
toc = [NSDate date];
printf("String append with initial capacity 0 took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// start the string off with enough capacity to hold the whole
string, so no
// reallocations are necessary
tic = [NSDate date];
theString = [NSMutableString stringWithCapacity:niters];
for (i = 0; i < niters; i++)
[theString appendFormat:@"%d", (i % 10)];
toc = [NSDate date];
printf("String append with initial capacity %d took %f seconds\n",
niters, [toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// try using a buffer, using appendBytes:length
tic = [NSDate date];
NSMutableData *buffer = [NSMutableData dataWithCapacity:niters];
for (i = 0; i < niters; i++) {
char ch = (i % 10) + '0';
[buffer appendBytes:&ch length:1];
}
toc = [NSDate date];
printf("NSMutableData appendBytes took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
pool = [NSAutoreleasePool new];
// finally, try assigning directly to a memory buffer
tic = [NSDate date];
buffer = [NSMutableData dataWithCapacity:niters];
char *chars = (char*)[buffer mutableBytes];
for (i = 0; i < niters; i++) {
char ch = (i % 10) + '0';
chars[i] = ch;
}
toc = [NSDate date];
printf("Assigning into char array took %f seconds\n",
[toc timeIntervalSinceDate:tic]);
[pool release];
return 0;
}
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden