PS: These are all optimising builds, -O2 in the case of
gcc, and using gcc 4.2. In fact if you're not careful, gcc optimises the
whole outer loop away, so I went with this:
#define __DARWIN_10_6_AND_LATER_ALIAS(f) f
#include "stdio.h" #include "unistd.h" #include
"memory.h" #include "float.h"
void x (double *buf) { buf [0] =
1; }
int main () { for (int j = 0; j
< 10000000; ++j) { #if
0 char buf
[8192]; x
(buf); memset (buf, 0,
8192); #else double buf
[1024]; x
(buf); for (int i = 0; i <
1024; ++i)
buf [i] = 0.0; #endif } }
Timings for memset are ~2 sec, and for buf [i] = 0.0 ~6
sec on my machine. The timings for Visual C++ are a lot closer and the
other way round - sorry, no details to hand, I did it a while ago in a wider
context - and Visual C++ optimises the outer loop away in the above code, even
with the call to x () in there so I can't easily measure it again.
Regards,
Paul Sanders.
----- Original Message -----
Sent: Wednesday, June 16, 2010 4:29 PM
Subject: Re: long double data type
Good question. I made this observation on Visual
C++. So I had a go on gcc / Intel / 32 bit and it's the other
way round - decidely so - and when I look at the code the compiler generates I
see that buf [i] = 0.0 is implemented as two 32 bit movl instructions.
Visual C does this in one shot using a scalar SSE instruction. Also, the
gcc memset is very fast so it might be using SSE 128 bit vector instructions in
the main loop.
All of which just goes to show there's no substitute for a
bit of experimentation, which is really what I was trying to say.
Regards,
Paul Sanders
----- Original Message -----
Sent: Wednesday, June 16, 2010 3:48 PM
Subject: Re: long double data type
On Wed, Jun 16, 2010 at 6:37 AM, Paul Sanders <email@hidden>
wrote:
> Bonus question: which is faster? (note that the two buffers
are actually the > same size): > > (a) char buf [8192]; memset
(buf, 0, 8192); > (b) double buf [1024]; for (int i = 0; i < 1024; ++i)
buf [i] = 0.0; > > In fact it is (b), because the code generated
uses 64 bit loads and stores. > Just goes to show there's no substitute
for a bit of experimentation.
how many compilers and C libraries have you
tested this with?
|