I have stumbled upon a serious performance anomaly with the
Macbook pro and wondered if anyone had any brilliant suggestions
on how to track down this mystery.
I'm always up for wild suggestions. How about... your problem is
about the performance penalty in dealing with denormals, which on
x86 you incur for both scalar and SSE code by default.
Perhaps you're fetching the floating point data from some place
other than you think you are (uninitialized, or from some other
data structure, or big endian FP data on a little endian machine),
and the FP data you are processing ends up being denormals and is
really slow.
As the size of tmp[] goes between 1800 and 2600, the stack frame
changes, causing window of memory where you fetch this bad data
from to slide between fetching nearly all denormals, to fetching
some, to fetching none.
You could test this theory by making the call (can't remember off
the top of my head) to disable denormal handling, run the code, and
see if it makes a difference.
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
#if defined( FE_DFL_DISABLE_SSE_DENORMS_ENV )
//Disables denormal handling and resets FP masks & flags to default
values
fesetenv( FE_DFL_DISABLE_SSE_DENORMS_ENV );
#endif
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription: