My vectorized version of atan2 that I wrote for MacFOH performs the
full rectangular-to-polar conversion, including UNWRAPPED normalized
phase and magnitude in decibels, and profiles 30x faster than
(float)atan2(y,x) with floats. The atan2f "cheesy" portion of the
vectorized code is 24x faster than libm.
It probably should be mentioned at this point (before the inquest
starts) that the reason why libm is the way it is is several fold:
1) libm is required to deliver correct results (not sorta correct)
including edge cases, with the correct rounding mode, exceptions, etc.
This is expensive because it involves configuring the
FPSCR and often quite a bit of branching to deal with edge cases not
covered by the general purpose algorithm.
2) libm is required to take a single set of arguments and return a
single result.
Read: not enough data -> not enough work to do -> pipeline
bubbles
3) libm is sitting behind a dylb stub
..and in certain cases also triggers PIC
So, in summary, it is the way it is because it is required to be so by
standards. It is a straw man, set up to be knocked over.
People shouldn't see this as a slam against libm; they should use it as
proof that performant applications require inlining/pipelining hot
functions, to remove as much overhead as possible.
One thing I've grown to love about AltiVec is that you can perform a
lot of operations in advance and/or for free. If you Shark a function
and it has even a moderate amount of stalls (more than a few cycles
total) then it is a ripe candidate for performing some free work for
you. When I rewrap my unwrapped phase, I have to conditionally
generate some additional data points which require two 4x4 matrix
rotations; the first rotation is performed for free by the VPERM unit
prior to testing for the condition, and its stores happen while the 2nd
rotation is performed post-condition, otherwise the first rotation's
results are simply discarded at no loss. Whenever you see stalls, you
can usually sneak more code into a function, and often the gain is more
than the sum of the parts. ;)
--
Shaun Wexler
MacFOH http://www.macfoh.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden