You did look for any cache hints that might have been put in for the G4, right?
dcbz = evil on the G5
And you need to change the distance and cacheline size for read
aheads (dcbt) on the G5.
Chris
At 12:23 PM -0700 10/13/04, Bevis, Chris wrote:
I am benchmarking accelerated FFT's ( 512x512 real, using fft2d_zrip
in vDSP ) and am getting some wierd results :
1Ghz single G4 ( codegen and instruction scheduling set for G4 )
13msec
1Ghz single G4 ( codegen and instruction scheduling set for G5 )
12msec
2Ghz dual G5 ( codegen and instruction scheduling set for G5
) 17msec
When I launch two instances simultaneously on the dual G5, THey both
finish in 24msec.
Does anyone have any idea why a 2x faster dual G5 performs slower
than a G4 for a single instance and about the same for two instances
? This one has me puzzled.