Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[apple scitech] Re: Accelerated Cartesian Vector Struct (2)




< part 2 of 3 >

This doesn't mean that 3D geometry can't be accelerated efficiently in the vector unit -- even small dot products! Naturally scaling by 4x (or better!) is easily possible. You "just" need to organize your code to take advantage of economies of scale. That is, process a bunch of vertices at once. Unfortunately, that usually means taking a giant wrecking ball to your application core data structures in order to solve the structural problems that are holding the vector unit back.

Namely, replace packed data structures such as this:

/*
	NON-ACCELERATED EXAMPLE

	A Cartesian vector structure and convenience make function.
	Let's also add a simple function to calculate the length
	of the vector.
*/
typedef struct _Vector
{
	float i;
	float j;
	float k;
} Vector;

...with planar array representations something like this:

#define kMyVectorSize	16	/* should be a multiple of 4 */

typedef union VectorOfVertices
{
	struct
	{
		float	 i[ kMyVectorSize ]		__attribute__ ((__aligned__ (16)));
		float	 j[ kMyVectorSize ]		__attribute__ ((__aligned__ (16)));
		float 	 k[ kMyVectorSize ]		__attribute__ ((__aligned__ (16)));
	};
	struct
	{
		vFloat vi[ kMyVectorSize/4];
		vFloat vj[ kMyVectorSize/4];
		vFloat vk[ kMyVectorSize/4];
	};
} VectorOfVertices;

This means grouping many vertices together in the same structure. This can cause its own problems in some cases. For example, common optimizations like just calculating points that fall in the view frustrum might have to be thrown out. You'll need to proceed judiciously here. On the other hand, it can often do wonderful things for your cache organization (c.f. Judy trees), if you can identify sets of vertices that "go together", for example, the set of vertices in an avatar's leg, or 5 consecutive amino acids in a protein. These are likely to be found near each other, and are therefore likely subject to similar sets of operations, so can usually be treated as a single unit.

In any case, once you have your {i,j,k} or {x,y,z} or what-have-yous in separate arrays, the vector arithmetic starts to look a lot like the scalar arithmetic done wider, and should speed up by approximately a factor of 4 on G5/Core 2.

	#include <Accelerate/Accelerate.h>

// Calculate the distance of 4 vertices from the origin
vFloat VectorLength( vFloat vi, vFloat vj, vFloat vk )
{
return vsqrtf( vi * vi + vj * vj + vk * vk );
}

or maybe like this for more than four vertices at a time (usually somewhat more efficient):


void VectorLength( restrict vFloat *results, const restrict vFloat *vi, const restrict vFloat *vj, const restrict vFloat *vk, int vec128Count )
{
int i;
for( i = 0; i < vec128Count; i++ )
results[i] = vi[i] * vi[i] + vj[i] * vj[i] + vk[i] * vk[i];


		i = vec128Count * 4;
		vvsqrtf( results, results, &i );
	}

< to be continued >
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.