Hi All
Apple gcc 4.0 is leaps and bounds better than 3.3 for CSE (common subexpression elimination). For example in macstl 0.3.x I can now reliably do:
valarray <float> a, b; valarray <float> b = a + a + a;
and have one lvx (or lfs) within the inner loop, instead of 3 lvx when the compiler can't track the identical origins of a in the _expression_.
Unfortunately, the CSE is fairly finicky, I've been chasing better CSE for macstl and discovered any of the following will make it worse i.e. longer expressions will cause all loads and code to be taken/generated, but shorter expressions will cause only the minimal loads/code implied by CSE:
* -maltivec, -mcpu=G5 even on scalar code (possibly because -faltivec without -maltivec affects how temporaries are generated??) * the # of temporaries generated within the _expression_, even if the load/store to memory is eventually optimized out. * the phase of the moon
Since it looks like the compiler gives up CSE at a certain length of _expression_ rather than with a definite combination of options/usage, it feels like there's some sort of "maximum length of CSE'able _expression_" flag in gcc.
To all you gcc gurus, is there such a flag?
Cheers, Glen Low
--- pixelglow software | simply brilliant stuff www.pixelglow.com aim: pixglen
|