--- Yoram Meroz <email@hidden> wrote:
> > As far as I know, fres has never been pipelined. frsqrte and vrefp have
> been
> > pipelined and have latency similar to multiply.
>
> fres is pipelined on the IBM G3 (2 cycle throughput) and the G5 (1 cycle).
IBM's user manual for the PPC750 (
http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF7785256996006C28E2
) says 'fres' takes 10 cycles and is not pipelined. That's what Shark shows
for the G4 7400 (Shark 4 does not have timing info for a G3).
I don't have a G3 to confirm...if someone does, please try the following test
program:
#include <ppc_intrinsics.h>
// for the lazy engineer who doesn't want to use a calculator
#define CPU_FREQ 1670000000
int main() {
float x1,x2,x3,x4,x5;
x1=x2=x3=x4=x5=1.0f;
int i;
for (i=0; i<CPU_FREQ; i++) {
x1 = __fres(x1);
x2 = __fres(x2);
x3 = __fres(x3);
x4 = __fres(x4);
x5 = __fres(x5);
}
return x1+x2+x3+x4+x5;
}
Compiled with gcc4 -O1 on my G4 (7450) running at 1.67GHz, this program takes
65 seconds. If you remove the last 4 fres instructions, it takes 14 seconds.
Clearly, fres is not pipelined on a 7450. Shark shows 14:14 for
latency:throughput timing of fres...it really should be 14:13.
--
Sanjay
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden