Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Altivec: Extracting Floats From Vector Float



> Does anyone know approximately how many clock cycles it takes to copy
> the memory from the vector register to the scalar register?... At least
> compared to something like vec_madd()

It depends on the processor.  On a recent G4, it takes 3 cycles to 
complete the store instruction, another 6 before the data is ready 
forwarding and then at least 3 cycles to complete the load. On a G5, 
there is likely also a load/store alias reject going on, which can cost 
you ~50 cycles. Just say no.

vec_madd can execute with a throughput of 1 vec_madd per cycle.

It is likely you can use the vector unit to do the thing that you think 
you need the scalar units for. For example, to increment one float:

	vector float f = vec_lde( 0, &the_float);
	f = vec_add( f, (vector float) (1.0f) );
	vec_ste( f, 0, &the_float );

Unless you pull up a denormal with vec_lde on G5 (and VSCR[NJ] = 0), 
this should be roughly as fast as the same code for the scalar unit:

		float 	f;

		lfs		f, 0, &the_float
		fadds	f, f, 1.0f
		stfs	f, 0, &the_float

You can do almost anything with the vector unit that you can do with 
the scalar units. Doing scalar operations one at a time in the vector 
unit doesnt add any parallelism and you have to deal with alignment in 
software, but at least you didn't pay the cost to move data over to the 
scalar units.

Ian
_______________________________________________
scitech mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/scitech
Do not post admin requests to the list. They will be ignored.




Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.