So it looks like there is an threshold value of 256x256 beyond which
both G4 and G5 overflow the cache and that except for the 512x512
case, the G5 is faster. The exception is presumably because the
number of cache misses is greater on the G5 than G4.
Given that our source images are integers, are there altivec libraries
for 8 or 16 bit integer fft's ? Any other suggestions for improving
performance on larger 2d data sets ?
Looking at the FFTW benchmark page
(http://www.fftw.org/speed/g5-2GHz/) it seems likely that FFTW would
be a good thing to look at next.
I'm not quite sure where vDSP will be next year on the issue of large
FFT speed. We have plans to look at it but it is not clear when or if
any performance improvements will be delivered.