vector unsigned short peepholebug(vector unsigned short a, vector
unsigned short b) {
vector unsigned short mask = (vector unsigned short)vec_cmplt(a, b);
if (vec_all_ge(a, b)) return a;
return mask;
}
XCode 2.2.1 / gcc 4.0 generates ( with optimizer set to -os):
Note the second "vcmpgtuh" instruction, which is completely
superfluous. The peephole optimizer should recognize this situation
and remove the instruction. (I've filed a bug with Apple; #4519214.)
Anyone know if more recent versions of gcc are able to do this? I
have some bottleneck code that could seriously benefit from this, and
I'd rather avoid assembly if I can...