Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AltiVec (small example)



Your example isn't entirely fair since you're making your C code do at least one third more work than it needs to. However, to really calculate the real-world improvement from AltiVec, you should use vadd() and vsmul() from the Accelerate.framework. The C code is below:

void veclib_add_mult(float* a, float* b, float* c, int* n) {
	float	multiplier = 2.0;

	vadd(a, 1, b, 1, c, 1, *n);
	vsmul(c, 1, &multiplier, c, 1, *n);
}

This should run considerably faster. There's also a function called vam() which will do exactly what you want, but it needs a third vector containing the multiplier. If you create a third vector (say dd) and fill it with the value 2.0, you can use vam() and your code should run even faster.

Now, using the Accelerate framework should give you a good idea of what to shoot for in your own code. Without getting into too much detail, there are quite a few improvements you can make which should speed up your own code. Here's a revised version of your C code:

// NOTE: I typed this directly into my mail client; it should work, but I haven't tested it.
void my_vec_add(float *a, float *b, float *c, int *n) {
int i, length = *n;


	for(i=0; i < length ; i += 16) {
		vector float	vec_a, vec_b, vec_sum, vec_c;

		vec_a = vec_ld(i, a);
		vec_b = vec_ld(i, b);

		vec_sum = vec_add(vec_a, vec_b);
		vec_c = vec_add(vec_sum, vec_sum);

		vec_st(vec_c, i, c);
	}
}

The relevant changes are:
1. Using vec_ld() explicitly rather than dereferencing the pointers. At least on earlier versions of GCC, I've found that this helps the compiler to unroll the loop and optimize a bit more.
2. Using only two vec_add()'s. In performance-critical code, you never, ever want to use a suboptimal algorithm.
3. Miscellaneous improvements like taking out the "inline" and the #define's, neither of which are necessary or particularly good form.


This should run a little faster than what you had before. If you really want to make this run as fast as possible, you might have to do some loop unrolling by hand (depending on whether the compiler does a good enough job of it for you) and perhaps some cache hints. Honestly though, cache hints and things like vec_ldl() are only a last resort and will only give you a small percentage improvement.

Brendan Younger

On Sep 16, 2004, at 2:35 AM, Kyros Yakinthos wrote:

OK,

After the very nice talk about AltiVec,
I suggest the following:

Let's proceed to a small test.
Here is a small FORTRAN code.
It calculates a "stupid" sum, writes the time needed for this calculation
and then it calculates once again the same sum by calling a C subroutine
where AltiVec is used.


----------------------------------------------------------------------- -----------
real,dimension(:),allocatable:: aa,bb,cc
c real(8)::t1,t2,t3,t4
double precision time, overhead,time1,time2,time3,time4


      ni=6000000

      allocate(aa(ni))
      allocate(bb(ni))
      allocate(cc(ni))

      aa=2.; bb=3.


call cpu_time(time1) cc=bb+aa+bb+aa call cpu_time(time2) write(*,*)time2-time1


call cpu_time(time3) call my_vec_add( aa, bb, cc, ni) call cpu_time(time4) write(*,*)time4-time3

end
----------------------------------------------------------------------- ------------------------


ni is the dimension of the scalars which should be always a product of 4.


And the C subroutine is:


----------------------------------------------------------------------- -------------------------
//#include <vecLib/vfp.h>


inline void my_vec_add(float *a, float *b, float *c, int *n)

{
		#define	va ((vector float*) a )
		#define	vb ((vector float*) b )
		#define	vc ((vector float*) c )

	int i;
	int step = vec_step( vector float );
	int loops = *n / step;
	for( i=0 ; i < loops ; i++ ) {

	*vc =vec_add(vec_add(*va, *vb),vec_add(*va, *vb));


va++; vb++; vc++;

}
}
----------------------------------------------------------------------- -----------------------


Let's compile these sources and measure the times in various machines (G4s, G5s).
A good parameter is the ni. By changing this number, we can see what happens
concerning the AltiVec.
(It would be nice to show the speedups when using various compilers.
I think, it is a very nice and simple test).


My experience with xlf is that (when I use the -O5 option), Altivec can show a 2X speedup but this case is rare.
Of course a 4X speedup is a dream.
OK, I know the problem is maybe with prefetching or what else...


My results (xlf with -O5, gcc with -O3, G4 867 PB):

ni=6000000
I measure FORTRAN add 0.17999, AltiVec 0.15999 :-((
ni=2000000
FORTRAN add 0.14, AltiVec 0.6999E-01 (2X speedup)
ni=1000000
FORTRAN add 0.30E-01, AltiVec 0.2999E-01 (!!!!!!)
ni=8000000
FORTRAN add 0.2199, AltiVec 0.16 (!!!!!)

In another C subroutine when I was using
and extra if statement for the cases where the floats where not a multiply
of 4, AltiVec was slower. This seems to me a logical result.


The results on a G5 are not better. (xlf) FORTRAN add, some times shows nearly equal results with AltiVec
Unfortunately, I had to re-setup yesterday my very small cluster of XServes and I do not have some
fresh numbers to report.


So, could anyone suggest how should I proceed?
By taking into account the cache directives?
If yes, is this also applied to G5?
By looking at Shark's messages which are very "mystic" for a simple engineer who is
programing with FORTRAN?
Or by searching for a computer engineer to help me?


Thank you again,

Kyros




_______________________________________________ Do not post admin requests to the list. They will be ignored. Scitech mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/scitech/email@hidden

This email sent to email@hidden


_______________________________________________ Do not post admin requests to the list. They will be ignored. Scitech mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/scitech/email@hidden

This email sent to email@hidden
References: 
 >AltiVec (small example) (From: Kyros Yakinthos <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.