Thanks. I tried it, and (after some translation for gcc) found that
the atan2 replacement has good accuracy but is little if any faster
than the library atan2.
Hmm... well, thank you for giving it a try. The benchmarks for this
code were done four years ago, with CodeWarrior under OS9, and it was
tuned for the early G4, which may account for your speed observations.
(Library atan2 has no doubt improved as well.) Are you on G4 or G5,
and
single or double precision?
I compared OS X 10.3.6 atan2( y, x ) with the single version of your
function atan2r_.
In converting your CW code to gcc, one line caused many cold beers and
hot coffees to be consumed:
real32* asbuf = (real32*)(address(atanbuf_) + ind);
before an (int) cast emerged as a correct conversion:
real32* asbuf = (real32*)((int)&atanbuf_ + ind);
My test suite of {y,x} values included all 8 octants.
gcc flags were -force_cpusubtype_ALL -lmx -O3 -mcpu=G5 -mtune=G5.
atan2r_ is very accurate and measurably faster than atan2:
atan2r_ |error| = 0.000000
atan2 elapse : 0.96 s
atan2r_ elapse : 0.73 s
but I need a speedup of at least 2.
Robert P.
Robert,
I've pasted some improved code below, hopefully wil get you the 2x
speedup you're looking for.
This is double-precision, but can trivially be converted back to
single-precision (except in CodeWarrior, which inexplicably lacks an
__fsels() intrinsic, and generates hordes of 'frsp's when __fsel is
used).
The inverse square-root is now inlined, which is important because the
degeneracy test only has to be done once, instead of twice as before.
Also, the radius is no longer returned, which should save a few cycles.
Note that using __fsel here has eliminated nearly all of the branching.
The 'goto' in the original code was designed to avoid branching for
the standard code stream, but CodeWarrior is now screwing that up as
well, so I removed it. I also had to set "#pragma scheduling once" to
avoid massive register spills. (Is anyone still using CodeWarrior?
I've been having a pretty abysmal experience with XCode so far; hard to
know what to do.)
Apologies for the "address" terminology; I should have added that
typedef to my page. It's intended as a 64-bit-safe way to represent a
location in memory as an integer type.