Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fast atan2




On Jan 25, 2005, at 4:50 PM, Robert Purves wrote:


Ben Weiss <email@hidden> wrote:

http://www.shellandslate.com/computermath101.html
Ben

Thanks. I tried it, and (after some translation for gcc) found that the atan2 replacement has good accuracy but is little if any faster than the library atan2.

Hmm... well, thank you for giving it a try. The benchmarks for this
code were done four years ago, with CodeWarrior under OS9, and it was
tuned for the early G4, which may account for your speed observations.
(Library atan2 has no doubt improved as well.) Are you on G4 or G5, and
single or double precision?

I compared OS X 10.3.6 atan2( y, x ) with the single version of your function atan2r_.
In converting your CW code to gcc, one line caused many cold beers and hot coffees to be consumed:
real32* asbuf = (real32*)(address(atanbuf_) + ind);
before an (int) cast emerged as a correct conversion:
real32* asbuf = (real32*)((int)&atanbuf_ + ind);


My test suite of {y,x} values included all 8 octants.
gcc flags were -force_cpusubtype_ALL -lmx -O3 -mcpu=G5 -mtune=G5.

atan2r_ is very accurate and measurably faster than atan2:
  atan2r_   |error| = 0.000000
  atan2     elapse : 0.96 s
  atan2r_   elapse : 0.73 s
but I need a speedup of at least 2.

Robert P.

Robert,

I've pasted some improved code below, hopefully wil get you the 2x speedup you're looking for.

This is double-precision, but can trivially be converted back to single-precision (except in CodeWarrior, which inexplicably lacks an __fsels() intrinsic, and generates hordes of 'frsp's when __fsel is used).

The inverse square-root is now inlined, which is important because the degeneracy test only has to be done once, instead of twice as before. Also, the radius is no longer returned, which should save a few cycles.

Note that using __fsel here has eliminated nearly all of the branching. The 'goto' in the original code was designed to avoid branching for the standard code stream, but CodeWarrior is now screwing that up as well, so I removed it. I also had to set "#pragma scheduling once" to avoid massive register spills. (Is anyone still using CodeWarrior? I've been having a pretty abysmal experience with XCode so far; hard to know what to do.)

Apologies for the "address" terminology; I should have added that typedef to my page. It's intended as a 64-bit-safe way to represent a location in memory as an integer type.

Let me know how this works for you.

Ben

--

real64 CMath::atan2_(real64 y_, real64 x_) {
	Assert(ataninited);
		
	real64 mag2 = x_ * x_ + y_ * y_;
	
	real64 rinv;
		{
		real64 _min = 1.0e-307;
		real32 _1p5 = 1.5;
		real64 fp0, fp1, fp2, fp3;
		
		fp0 = mag2;
		fp2 = __frsqrte(fp0);
		fp1  = _1p5 * fp0 - fp0;
		fp3  =        fp2 * fp2;	if(fp0 < _min)					return 0;
		fp3  = _1p5 - fp1 * fp3;
		fp2  =        fp2 * fp3;
		fp3  =        fp2 * fp2;
		fp3  = _1p5 - fp1 * fp3;
		fp2  =        fp2 * fp3;
		fp3  =        fp2 * fp2;
		fp3  = _1p5 - fp1 * fp3;
		fp2  =        fp2 * fp3;
		fp3  =        fp2 * fp2;
		fp3  = _1p5 - fp1 * fp3;
		rinv =        fp2 * fp3;
		}
	
	real64 x, y;
	real64 ypbuf[1];
	real64 _2p43 = 65536.0 * 65536.0 * 2048.0;
	real64 yp = _2p43;
	
	real64 flag4 = y_;
	real64 flag2 = x_;
	real64 flag1 = __fabs(y_) - __fabs(x_);
	
	y_ = __fsel(flag4, y_, -y_);
	x_ = __fsel(flag2, x_, -x_);
	
	yp += rinv * __fsel(flag1, x_, y_);
	x   = rinv * __fsel(flag1, y_, x_);
	y   = rinv * __fsel(flag1, x_, y_);
	ypbuf[0] = yp;
	
	int32 ind = (((int32*)(ypbuf))[iman] & 0x03FF) * 2 * sizeof(real64);
		
	real64* dasbuf = (real64*)(address(datanbuf_) + ind);
	
	real64 sv  = yp - _2p43;	// index fraction
	real64 cv  = dasbuf[0];
	real64 asv = dasbuf[1];
	
	sv = y * cv - x * sv;	// delta sin value
	
	// ____ compute arcsin directly
	real64 asvd = 6 + sv * sv;	sv *= real64(1.0 / 6.0);
	real64 th = asv + asvd * sv;
	
	th = __fsel(flag1, _2pi / 4 - th, th);
	th = __fsel(flag2, th, _2pi / 2 - th);
	th = __fsel(flag4, th, -th);
	
	return th;
	}

_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Fast atan2 (From: Robert Purves <email@hidden>)
 >Re: Fast atan2 (From: Shaun Wexler <email@hidden>)
 >Re: Fast atan2 (From: Robert Purves <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.