Puzzling performance difference after refactor
Puzzling performance difference after refactor
- Subject: Puzzling performance difference after refactor
- From: Bill Monk <email@hidden>
- Date: Fri, 30 Jun 2006 15:13:24 -0500
(If this would be better directed to PerfOptimization-dev please let
me know.)
I have here a client app which began life as pascal code, was
translated to c, carbonized in CodeWarrior, moved to Xcode and now
lives as a Mach-O bundleized app.
Originally, much of the code was in a single 2MB file. In CW this was
workable (if not entirely to my personal taste). However, a 2MB file
brings Xcode to its knees, with debugging being particularly painful.
So the file was refactored into a number of smaller files.
So now Xcode no longer requires 45-90 seconds to open a file. That's
good.
Here's the puzzling thing: the app built from the single file turns
out to be about twice as fast as the app built from multiple files.
The Xcode project file for the refactored version is a direct copy of
the older, single file version. The only difference is that the new
project contains all the refactored files and headers, and the old
project contains just the one large file.
I've put the build settings of the two projects side-by-side and
painstakingly compared them line-by-line. No differences that I can
see (except the multi-file project has require prototypes turned on,
see below).
I've done a diff on the .pbxproj files from each project. There seem
to be no substantive differences. (Obviously there are differences in
such things as the number of files in PBXGroup and
PBXSourcesBuildPhase sections, etc.)
Yet the one which builds from a single file is 2X faster.
How do I know this?
The app does some number crunching; each calculation takes a well-
defined number of iterations through a hot loop, where a fair bit of
work is done. If you tap the Option key, it logs the number of
iterations completed and the ETA for finishing. The single-file app
consistently logs 2X the speed of the multi-file version.
Maybe the logging code is broken? Well, setting aside the fact that
it's identical in each version, if you let a calculation run to
completion and time it with a stopwatch, the actual results are about
what the logging code predicted. Example: on a 1.25GHz/1.25GB G4, the
single-file app takes about 15 minutes to process a certain data set;
the multi-file version, about 28 minutes.
The results are consistent across processor types; from a 500MHz G3,
various single and dual-core G4s, and on a G5 dual, while the actual
times of course differ, the single-file version is always about twice
as fast as the multi-file version.
Sharking the hot loop for the two versions is interesting. In both,
of course, there's a function, let's call it A, which takes most the
time, since that's where the work is done. Function A calls B, C, and D.
Shark shows that in the slow, multi-file app, function B is taking
44% of the time spent in A.
In the single-file, fast version, Shark shows that B is statistically
insignificant. There are lines near B which show values as small as
0.1% in the Self column, but in the fast version, B has no entry in
the Self column.
Yet in the slow version, B shows 44%.
The only code differences between the two are these:
The single-file version declares most functions static and uses few
prototypes; instead functions are located in the file leaf-first so
that functions are almost always defined before they are used.
The multi-file version breaks this into about 25 .c files and their
corresponding headers. All functions have prototypes, none functions
declared static.
To guaran-damn-tee there are no other code differences in the hot
loop, I copied and pasted the body of every function involved from
the original single-file version into the multi-file version. There
was no need to do this, they were already identical, but what the
heck. Result: no change.
Now the app could no doubt benefit from some of Shark's suggestions.
In fact I duplicated the projects and implemented a couple of the
suggestions, just to see what wuould happen, In the fast, single-
file version, they made a difference. In the slow version, their
effect is dwarfed by the effect of the massively slower function "B".
When I first saw this problem, I figured some build setting had
simply gotten flipped and that putting it back would fix things. Now
I have no idea what the trouble is. If merely moving functions
definitions from file to file and removing their static declarations
can have this kind of effect on performance, it'll be news to me. But
at this point, anything that solves it will be news to me, because
I'm thoroughly stumped.
Ideas?
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden