Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fast atan2




On Jan 27, 2005, at 3:28 AM, Robert Purves wrote:


Ben Weiss <email@hidden> wrote:
I've pasted some improved code below, hopefully wil get you the 2x speedup you're looking for.

<snip>

I successfully compiled the new function atan2_ (thanks, Ben), replacing the 'address' operator by (intptr_t)& (thanks, Ian).

With gcc optimization -O0 through -O3, atan2_ works correctly with very small error. However when I tried it with gcc's -fast option, the maximum error was huge: 0.57 radians!

A little detective work showed that -fstrict-aliasing (a component of -fast) is the culprit. Evidently the type-punning involving ypbuf needs to be more robustly specified, for example by making ypbuf a union

union int32int32_u_double {
	struct { int32 iexp, iman; } i;
	real64  d;
};

with appropriate obvious changes to the few code lines that reference ypbuf. In this way the code is easily immunized against the depredations of -fast.


Comparison of the corrected atan2_ with the official atan2: atan2_ |error| = 4.440892e-16 atan2 elapse : 0.87 s atan2_ elapse : 0.64 s

So, splendid accuracy but not yet 2 x faster than atan2 :-(

I don't need anything like full double, or even single, precision. An absolute error of 1e-4 would be acceptable.

The search for a suitable fast atan2 continues, all candidates so far having been either too inaccurate, or not fast enough.


Robert P.


Robert,

First of all, you can trivially remove one (and perhaps two) iterations of the initial inverse square root calculation, depending on your precision needs. This alone might get you to the factor of 2.

The conversion to 'fsel' mostly improves the case of random input, of course... are you testing this with randomized or ordered input? (and what would your final application be?)

Finally, have you tried adapting my changes back to single-precision with 'fsels'? On processors where single-precision fp is faster than double, you should get a measurable win out of this. (G5 appears not to be, alas...) The magic bit-twiddling might also run a hair faster in single-precision.

Hope this helps,

Ben

_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Fast atan2 (From: Robert Purves <email@hidden>)
 >Re: Fast atan2 (From: Shaun Wexler <email@hidden>)
 >Re: Fast atan2 (From: Robert Purves <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.