I did a few simple tests from the command line. I
also took a glance at the object code generated (on Intel) and measured a
few things. This is what I found, using gcc 4.2 and no special
command-line switches:
- sizeof (long double) = 16 = 128 bits, as already
reported by Luigi
- arithmetic on doubles uses (scalar) SSE
instructions for a simple add-and-store, but uses the 'traditional'
8087 instructions for a
more complex _expression_, presumably to
preserve 80 bits of precision in the intermediate results (which I
like)
- arithmetic on long doubles also uses traditional
8087 instructions and stores 80 bit / 10 byte values to memory
(left-aligned in 16 bytes) rather than 64 bits
- no software emulation is used for long doubles
(which is no loss IMO - it would be too slow for DSP anyway)
According to my ancient 486 programmer's reference
manual:
Single (32 bit): 24 bit mantissa, 8 bit
exponent
Double (64 bit): 53 bit mantissa, 11
bit exponent
Extended (80 bit): 64 bit mantissa, 15
bit exponent
So the extra precision offered by long doubles on Intel is
significant.
According to my (very simple)
timings, arithmetic on long doubles is no slower than arithmetic on doubles (which is what you would expect, given that the
code generated is very similar) but it does double the size of all your arrays, which in turn hits the CPU
cache. My own experience of this (using floats vs doubles in an 8192
byte FFT) is that it makes a noticeable but not startling difference. But
as they say, memory is the new I/O. Perhaps store any temporaries as
long doubles but keep your arrays as doubles. In any case, it pays to
experiment. Performance gains (or losses) can come from unexpected
places. Failure to 8-byte align your doubles is a famous one on x86, but I
think gcc handles this in most cases.
I noticed that oddity in the AIFF spec too. It is a
little strange. Tricky to handle if you're not on Intel.
Regards,
Paul Sanders.