Re: Performance problem with GC enabled
Re: Performance problem with GC enabled
- Subject: Re: Performance problem with GC enabled
- From: John Engelhart <email@hidden>
- Date: Fri, 13 Mar 2009 20:56:44 -0400
On Fri, Mar 13, 2009 at 5:28 AM, Paul Sanders <email@hidden> wrote:
> Bill said something in passing on this issue which I think is important. To
> paraphrase: If you care about performance, don't use the Cocoa RegEx stuff
> to parse large amounts of data.
I disagree :), and I have numbers to back it up:
(RegexKitLite was used to do the regex processing in the examples below)
[subjectString componentsSeparatedByCharactersInSet:[NSCharacterSet
whitespaceAndNewlineCharacterSet]]
[subjectString componentsSeparatedByRegex:[NSString
stringWithUTF8String:"(?:\\r\n|[\n\\v\\f\\r\302\205\\p{Zl}\\p{Zp}\\t
])"]]
componentsSeparatedByCharactersInSet: Time used: 4114134.0, per:
1.493509760680, count: 2754675
componentsSeparatedByRegex: Time used: 1577230.0, per: 0.572564821621,
count: 2754675
In this case, regexes beat the system method by: 4114134.0 / 1577230.0 = 2.60.
---
[subjectString componentsSeparatedByString:@"\n"]
[subjectString componentsSeparatedByRegex:[NSString
stringWithUTF8String:"(?:\\r\n|[\n\\v\\f\\r\302\205\\p{Zl}\\p{Zp}])"]]
componentsSeparatedByString: Time used: 548741.0, per: 2.181959521253,
count: 251490
componentsSeparatedByRegex: Time used: 320646.0, per: 1.274985088870,
count: 251490
In this case, regexes beat the system method by: 548741.0 / 320646.0 = 1.71.
> I think this observation is true whether
> you use GC or not. GC just makes it worse. I'd like to see a pure-C
> benchmark of the original test, perhaps just from the command line using
> egrep. I suspect the results would be startling.
How about perl instead? (I don't think egrep is a fair test, it
doesn't have to 'do anything' with the results, like create a new
string from them). This is a rough perl equivalent of my original
problem:
$text = ""; $cnt = 0;
while(<>) { $text .= $_; }
for($loops = 0; $loops < 1; $loops++) { my @results; while($text =~
/\S+/g) { push(@results, $1); $cnt++; } }
shell% time /usr/bin/perl pl_rkl.pl BIG.txt
2.159u 0.030s 0:02.22 98.1% 0+0k 0+0io 0pf+0w
shell% time rkl_tests
1.874u 0.073s 0:01.97 98.4% 0+0k 0+0io 0pf+0w
Now, the perl example could be improved (notably the part that sucks
in the text), but I think it's fair to say that this isn't quite the
result you'd intuitively expect. Naturally, these results aren't
representitive of every use case, but I think it goes to show that
processing strings in Cocoa with regexes can be competitive with other
solutions out there.
> Having said all of which, I think the original test is not unfair and I
> agree with a lot of the points people have made in support of that view.
> It's always painful to have to step outside the Cocoa frameworks, and (off
> topic) it seems that GC can make it more so. I for one will not be using
> it.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden