Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XCode 2.2.1 / gcc 4.0 Peephole Bug



On Apr 19, 2006, at 1:42 PM, Ben Weiss wrote:

On Apr 19, 2006, at 1:27 PM, Shaun Wexler wrote:

What you should be concerned about is saving and restoring the VRSAVE register across such a simple function. It needs to be inlined. Use this prototype declaration:

inline vector unsigned short peepholebug(vector unsigned short a, vector unsigned short b) __attribute__ ((__always_inline__,__nodebug__));

No, the predicate instruction DOES correctly set the result register as well as the condition register (according to Motorola's Altivec manual), and I've now verified using assembly that the code works the same with the superfluous instruction removed, only faster. (At least on my G5, but I'd be amazed if G4 is different.) Also, the lack of VRSAVE is intentional; the real function uses all the vector registers, and I get a speed win by setting VRSAVE once in the thread entrypoint by hand. The tiny function i posted was just a test case that isolates the issue.

These instructions push VRSAVE and then mark v0 used:

mfspr r0,256
stw r0,-8(r1)
oris r0,r0,0x8000
mtspr 256,r0

This is your function:

vcmpgtuh. v0,v3,v2
vcmpgtuh v0,v3,v2
beq cr6,L99
vor v2,v0,v0

These instructions pop and restore VRSAVE:

lwz r12,-8(r1)
mtspr 256,r12

I would only expect a speedup on a 7400/7410 G4 which executes it in 1 cycle in its VSIU. Regardless if the 2nd vcmpgtuh is redundant, it takes 2 cycles to complete on either a 7450/7455 G4 or a 970 G5, so the compiler optimizes it that way by default. The vor is probably executed speculatively anyhow, and on G5 it will be first in its dispatch group because it follows a branch. YMMV.


If you declared this function always-inline, it would eliminate up to 8 instructions: 6 for VRSAVE op's, the redundant vcmpgtuh (if another group of instructions could fill that slot), and the blr.
--
Shaun Wexler
MacFOH
http://www.macfoh.com


Arguing with an engineer is like wrestling with a pig in mud.
After a while, you realize the pig is enjoying it.


_______________________________________________ Do not post admin requests to the list. They will be ignored. PerfOptimization-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Merom (Core): Intel's next-generation microarchitecture (From: Andrew Pinski <email@hidden>)
 >Re: Merom (Core): Intel's next-generation microarchitecture (From: Ian Ollmann <email@hidden>)
 >XCode 2.2.1 / gcc 4.0 Peephole Bug (From: Ben Weiss <email@hidden>)
 >Re: XCode 2.2.1 / gcc 4.0 Peephole Bug (From: Shaun Wexler <email@hidden>)
 >Re: XCode 2.2.1 / gcc 4.0 Peephole Bug (From: Ben Weiss <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.