On Mon, 7 Nov 2005, Rustam Muginov wrote:
[...]
> > vSum = vec_madd( vSrc1_0, vSrc2_0, vSum );
> > vSum = vec_madd( vSrc1_1, vSrc2_1, vSum );
> > vSum = vec_madd( vSrc1_2, vSrc2_2, vSum );
> > vSum = vec_madd( vSrc1_3, vSrc2_3, vSum );
[...]
There is an unnecessary serial dependency here on the variable vSum.
Consider introducing more partial sum variables, like this:
vSum0 = vec_madd( vSrc1_0, vSrc2_0, vSum0 );
vSum1 = vec_madd( vSrc1_1, vSrc2_1, vSum1 );
vSum2 = vec_madd( vSrc1_2, vSrc2_2, vSum2 );
vSum3 = vec_madd( vSrc1_3, vSrc2_3, vSum3 );
This way, the operations become independent. You add the partial sums only
after the loop, so you suffer from the stalls only once, rather than every
iteration.
Holger
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden