it's element-by-element multiplication and from the website your
library looks extremely elegant. I'm not very familiar with C++, but
it looks pretty simple just to get it doing the underlying
calculations. I'll give it a go.
My own solution to it in the end was (I guess this is pretty obvious,
but I'm new to these families of functions):
float *X, *Xdash, *Y, *b; float a; unsigned int n;
// X' = X/b
vvdivf(Xdash, X, b, n);
// Y = aX' + Y
cblas_saxpy(n, a, Xdash, 1, Y, 1);
It will be interesting to see what's quicker!
Leo
On 3 Sep 2006, at 05:17, Glen Low wrote:
Leo
On 31/08/2006, at 5:42 AM, Leo Critchley wrote:
Hello,
I am optimising a physics engine using vForce and BLAS, with good
results all-round (about a 2x speedup so far), but I've got an
algorithm to deal with that has at least a couple of approaches,
and I wondered if anyone had any advice.
Here's what I'm trying to do (3D vectors in capitals):
Y = a/bX + Y; where a is constant, X, Y and b have n sets of values.
First thing I did is use cblas_saxpy to do this on each 3D vec,
calculating a/b each time (in a for loop). This gives some
speedup, but still way below the improvement when I optimised:
Y = aX + Y
for which I put all the 3D vectors into one huge array and got
BLAS to do them with just one call to cblas_saxpy.
Are you using element-by-element multiply or matrix multiplication?
If the former, you can use my library macstl, which is based on the
C++ valarray classes:
valarray <float> b, X, Y;
float a;
valarray <float> Y = a / (b * X) + Y;
That last expression is evaluated fully with Altivec on PowerPC or
SSE on Intel.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden