Finally, have you tried adapting my changes back to single-precision
with 'fsels'? On processors where single-precision fp is faster than
double, you should get a measurable win out of this. (G5 appears not
to be, alas...) The magic bit-twiddling might also run a hair faster
in single-precision.
IIRC, double precision math being as fast as single precision math is
one of the improvements that came with the G4 class, so this would be a
G3 specific optimization in the end.
--
Reality is what, when you stop believing in it, doesn't go away.
Failure is not an option. It is a privilege reserved for those who try.