User-agent: Mozilla Thunderbird 0.8 (X11/20040913)
Holger Bettag wrote:
I could be mistaken, but it seems to me that in all direct performance
comparisons between GPU and CPU, vectorized CPU programs are conspiciously
absent.
An interesting example of GPU versus vectorized CPU programs is in:
Fast Database Operations using Graphics Processors in Proc. of ACM
SIGMOD 2004
There are cases where the GPU is substantially faster, even when taking
data copy time into account. However, the vectorized cpu code sometimes
also wins by a large factor (it is SSE2 code produced by the Intel
compiler version 7.1 for a dual 2.8GHz Xeon). I'm sure we could speed up
the cpu code by hand-tweaking it, as the authors did for the gpu code
produced by the nvidia cg compiler.
But, I see no reason that the cases where the gpu is winning could not
be equally fast given a wider vector unit on the cpu.