Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Developing most optimal DotProduct function



On Mon, 7 Nov 2005, Rustam Muginov wrote:

[...]
> >     vSum = vec_madd( vSrc1_0, vSrc2_0, vSum );
> >     vSum = vec_madd( vSrc1_1, vSrc2_1, vSum );
> >     vSum = vec_madd( vSrc1_2, vSrc2_2, vSum );
> >     vSum = vec_madd( vSrc1_3, vSrc2_3, vSum );
[...]

There is an unnecessary serial dependency here on the variable vSum.
Consider introducing more partial sum variables, like this:

     vSum0 = vec_madd( vSrc1_0, vSrc2_0, vSum0 );
     vSum1 = vec_madd( vSrc1_1, vSrc2_1, vSum1 );
     vSum2 = vec_madd( vSrc1_2, vSrc2_2, vSum2 );
     vSum3 = vec_madd( vSrc1_3, vSrc2_3, vSum3 );

This way, the operations become independent. You add the partial sums only
after the loop, so you suffer from the stalls only once, rather than every
iteration.

  Holger
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden

References: 
 >Developing most optimal DotProduct function (From: Rustam Muginov <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.