Re: XCode 2.2.1 / gcc 4.0 Peephole Bug
Subject : Re: XCode 2.2.1 / gcc 4.0 Peephole Bug
From: Ben Weiss <email@hidden >
Date: Fri, 21 Apr 2006 16:25:08 -0700
Delivered-to: email@hidden
Delivered-to: email@hidden
Shaun,
You're right about VRSAVE; in Codewarrior I set VRSAVE in the thread
entrypoint and use "#pragma altivec_vrsave off" to clean up the inner
loops, but evidently XCode/gcc uses a different mechanism. (Do you
know the equivalent for switching altivec_vrsave on/off in XCode?)
To clarify regarding vcmpgtuh, here is the source for my actual inner
loop, tweaked with assembly to work around the problem (view
monospaced; hope the formatting survives):
vs0 = vec_add(hists[0], vec_splat(vc, 0));
vs1 = vec_add(hists[1], vec_splat(vc, 1)); vsum = vec_adds(vsum,
vs0); asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
vs0 = vec_add(hists[2], vec_splat(vc, 2)); vsum = vec_adds(vsum,
vs1); asm { beq cr6, endls0; } asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
vs1 = vec_add(hists[3], vec_splat(vc, 3)); vsum = vec_adds(vsum,
vs0); asm { beq cr6, endls1; } asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
vs0 = vec_add(hists[4], vec_splat(vc, 4)); vsum = vec_adds(vsum,
vs1); asm { beq cr6, endls2; } asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
vs1 = vec_add(hists[5], vec_splat(vc, 5)); vsum = vec_adds(vsum,
vs0); asm { beq cr6, endls3; } asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
vs0 = vec_add(hists[6], vec_splat(vc, 6)); vsum = vec_adds(vsum,
vs1); asm { beq cr6, endls4; } asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
vs1 = vec_add(hists[7], vec_splat(vc, 7)); vsum = vec_adds(vsum,
vs0); asm { beq cr6, endls5; } asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
vsum = vec_adds(vsum,
vs1); asm { beq cr6, endls6; } asm { vcmpgtuh. vmask, vpiv, vsum; }
ppl = vec_sub(ppl, vmask);
asm { beq cr6, endls7; }
and Codewarrior's output for the middle section, with perfect
scheduling and no redundant vcmpgtuh's (view monospaced; the mailing
list bounces the message if I change the font):
...
lvx vr4,r10,r5
vsplth vr3,vr8,$0004
vadduhs vr0,vr0,vr5
vsubuhm vr2,vr2,vr7
vadduhm vr5,vr4,vr3
beq cr6,*+128
vcmpgtuh. vr7,vr31,vr0
lvx vr4,r10,r4
vsplth vr3,vr8,$0005
vadduhs vr0,vr0,vr5
vsubuhm vr2,vr2,vr7
vadduhm vr5,vr4,vr3
beq cr6,*+100
vcmpgtuh. vr7,vr31,vr0
lvx vr4,r10,r3
vsplth vr3,vr8,$0006
vadduhs vr0,vr0,vr5
vsubuhm vr2,vr2,vr7
vadduhm vr5,vr4,vr3
beq cr6,*+72
vcmpgtuh. vr7,vr31,vr0
lvx vr4,r10,r0
vsplth vr3,vr8,$0007
vsubuhm vr2,vr2,vr7
vadduhs vr0,vr0,vr5
vadduhm vr3,vr4,vr3
beq cr6,*+44
...
Hope this makes sense. I can live with inline assembly for now, but I
still hope gcc gets around to fixing this at some point.
Ben
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE
Contact Apple | Terms of Use | Privacy Policy
Copyright © 2007 Apple Inc. All rights reserved.