On Jan 20, 2005, at 3:52 PM, Shaun Wexler wrote:
On Jan 20, 2005, at 12:53 PM, Ian Ollmann wrote:
Alrighty, once the compiler is defeated, we see that the speed
improvement is there:
ollmia:/tmp iano$ gcc -O3 main4.c -Wmost
ollmia:/tmp iano$ ./a.out
best libm time for 1000 calls: 0.000113 seconds (801939)
best cheesy time for 1000 calls: 0.000017 seconds (800454)
My vectorized version of atan2 that I wrote for MacFOH performs the
full rectangular-to-polar conversion, including UNWRAPPED normalized
phase and magnitude in decibels, and profiles 30x faster than
(float)atan2(y,x) with floats. The atan2f "cheesy" portion of the
vectorized code is 24x faster than libm.
It probably should be mentioned at this point (before the inquest
starts) that the reason why libm is the way it is is several fold:
1) libm is required to deliver correct results (not sorta correct)
including edge cases, with the correct rounding mode, exceptions,
etc.
This is expensive because it involves configuring the
FPSCR and often quite a bit of branching to deal with edge cases not
covered by the general purpose algorithm.
2) libm is required to take a single set of arguments and return a
single result.
Read: not enough data -> not enough work to do ->
pipeline bubbles
3) libm is sitting behind a dylb stub
..and in certain cases also triggers PIC
So, in summary, it is the way it is because it is required to be so
by standards. It is a straw man, set up to be knocked over.
Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list
(email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/jstiles%
40blizzard.com
This email sent to email@hidden