| |||
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] |
On Apr 19, 2006, at 1:11 AM, Ben Weiss wrote:
Given the Altivec function:
vector unsigned short peepholebug(vector unsigned short a, vector unsigned short b) {
vector unsigned short mask = (vector unsigned short)vec_cmplt(a, b);
if (vec_all_ge(a, b)) return a;
return mask;
}
XCode 2.2.1 / gcc 4.0 generates ( with optimizer set to -os):
mfspr r0,256 stw r0,-8(r1) oris r0,r0,0x8000 mtspr 256,r0 vcmpgtuh. v0,v3,v2 vcmpgtuh v0,v3,v2 beq cr6,L99 vor v2,v0,v0 lwz r12,-8(r1) mtspr 256,r12 blr
Note the second "vcmpgtuh" instruction, which is completely superfluous. The peephole optimizer should recognize this situation and remove the instruction. (I've filed a bug with Apple; #4519214.) Anyone know if more recent versions of gcc are able to do this? I have some bottleneck code that could seriously benefit from this, and I'd rather avoid assembly if I can...
Ben, be glad the compiler is sometimes smarter than we are! ;-)
PS - please don't crosspost to multiple mailing lists. -- Shaun Wexler MacFOH http://www.macfoh.com
"I never let schooling interfere with my education." - Mark Twain _______________________________________________ Do not post admin requests to the list. They will be ignored. PerfOptimization-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
| References: | |
| >Re: Merom (Core): Intel's next-generation microarchitecture (From: Andrew Pinski <email@hidden>) | |
| >Re: Merom (Core): Intel's next-generation microarchitecture (From: Ian Ollmann <email@hidden>) | |
| >XCode 2.2.1 / gcc 4.0 Peephole Bug (From: Ben Weiss <email@hidden>) |
| Home | Archives | FAQ | Terms/Conditions | Contact | RSS | Lists | About |
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE
Contact Apple | Terms of Use | Privacy Policy
Copyright © 2007 Apple Inc. All rights reserved.