I am benchmarking accelerated FFT's ( 512x512 real, using fft2d_zrip
in vDSP ) and am getting some wierd results :
1Ghz single G4 ( codegen and instruction scheduling set for G4
) 13msec
1Ghz single G4 ( codegen and instruction scheduling set for G5
) 12msec
2Ghz dual G5 ( codegen and instruction scheduling set for G5
) 17msec
When I launch two instances simultaneously on the dual G5, THey both
finish in 24msec.
Does anyone have any idea why a 2x faster dual G5 performs slower than
a G4 for a single instance and about the same for two instances ?
This one has me puzzled.
Hi Chris,
A 512x512 array is about 1MB. The G5 has a 512 kB L2 cache and no
L3. A G4 might have up to 2 MB L3 cache. It seems possible you are
falling out of cache on the G5 and not on the G4. Does the trend go for
all sizes or just that size?