Thanks. I tried it, and (after some translation for gcc) found that
the atan2 replacement has good accuracy but is little if any faster
than the library atan2.
Hmm... well, thank you for giving it a try. The benchmarks for this
code were done four years ago, with CodeWarrior under OS9, and it was
tuned for the early G4, which may account for your speed observations.
(Library atan2 has no doubt improved as well.) Are you on G4 or G5, and
single or double precision?
Getting the radius (or 1/radius) for essentially free can be a big win
in some contexts, or you can save a few cycles by stripping that code
out and avoiding the returned reference.
Also, this code should also be straightforward to adapt to array form,
which would boost its throughput tremendously. Conversion to Altivec
might help there too, and the angle-reduction branches can be recoded
into a linear instruction stream without too much trouble... the
branches are performance-killers on G5 with random input. You know, I
believe it's high time I updated this code; I'm pretty sure the
underlying algorithm is unbeatable. Stay tuned for Computer Math 102...
:-)