Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XCode 2.2.1 / gcc 4.0 Peephole Bug



Given the Altivec function:

vector unsigned short peepholebug(vector unsigned short a, vector unsigned short b) {
vector unsigned short mask = (vector unsigned short)vec_cmplt(a, b);

if (vec_all_ge(a, b)) return a;

return mask;
}


XCode 2.2.1 / gcc 4.0 generates ( with optimizer set to -os):

mfspr r0,256
stw r0,-8(r1)
oris r0,r0,0x8000
mtspr 256,r0
vcmpgtuh. v0,v3,v2
vcmpgtuh v0,v3,v2
beq cr6,L99
vor v2,v0,v0
lwz r12,-8(r1)
mtspr 256,r12
blr

Note the second "vcmpgtuh" instruction, which is completely superfluous. The peephole optimizer should recognize this situation and remove the instruction. (I've filed a bug with Apple; #4519214.) Anyone know if more recent versions of gcc are able to do this? I have some bottleneck code that could seriously benefit from this, and I'd rather avoid assembly if I can...

Thanks,
Ben
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Merom (Core): Intel's next-generation microarchitecture (From: Andrew Pinski <email@hidden>)
 >Re: Merom (Core): Intel's next-generation microarchitecture (From: Ian Ollmann <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.