What's worked fairly well for me is defining two data structures along
the lines:
struct Cart3 { float x, y, z; };
struct VCart3 { vFloat x, y, z; };
Then I have a packVCart3() routine which takes an array of Cart3 and
repackages it as an array of VCart3. In other words, it takes each
group of 4 Cart3 elements and shuffles them around into 3 VCart3
elements.
(It also pads out the last group with zeros if necessary to form a
multiple of four.)
The nice thing about this arrangement is that you don't have to change
your scalar code all that much to get it vectorized. You're mostly
just substituting VCart3 for Cart3, using the vector library function
counterparts from Accelerate, and reducing your loop iterations by a
factor of four. There is, of course, a cost in shuffling your arrays
around like this (each grouping can be rearranged by 6 shufps/vperm
ops, btw), but in cases I have seen, the pre-shuffling loop is
typically O(N) while the calculation loop that follows is O(N^2) or
higher, so it doesn't really matter.
Be sure to run Shark and figure out where you need to vectorize before
you dive into this. Last week, I boosted the speed of a modelling
program by a factor of three after touching only two functions, so it
can really save you a lot of effort.
-Ted
On 25-Nov-07, at 4:33 AM, Ian Ollmann wrote:
< part 2 of 3 >
This doesn't mean that 3D geometry can't be accelerated efficiently
in the vector unit -- even small dot products! Naturally scaling by
4x (or better!) is easily possible. You "just" need to organize your
code to take advantage of economies of scale. That is, process a
bunch of vertices at once. Unfortunately, that usually means taking
a giant wrecking ball to your application core data structures in
order to solve the structural problems that are holding the vector
unit back.
Namely, replace packed data structures such as this:
/*
NON-ACCELERATED EXAMPLE
A Cartesian vector structure and convenience make function.
Let's also add a simple function to calculate the length
of the vector.
*/
typedef struct _Vector
{
float i;
float j;
float k;
} Vector;
...with planar array representations something like this:
#define kMyVectorSize 16 /* should be a multiple of 4 */
This means grouping many vertices together in the same structure.
This can cause its own problems in some cases. For example, common
optimizations like just calculating points that fall in the view
frustrum might have to be thrown out. You'll need to proceed
judiciously here. On the other hand, it can often do wonderful
things for your cache organization (c.f. Judy trees), if you can
identify sets of vertices that "go together", for example, the set
of vertices in an avatar's leg, or 5 consecutive amino acids in a
protein. These are likely to be found near each other, and are
therefore likely subject to similar sets of operations, so can
usually be treated as a single unit.
In any case, once you have your {i,j,k} or {x,y,z} or what-have-yous
in separate arrays, the vector arithmetic starts to look a lot like
the scalar arithmetic done wider, and should speed up by
approximately a factor of 4 on G5/Core 2.
#include <Accelerate/Accelerate.h>
// Calculate the distance of 4 vertices from the origin
vFloat VectorLength( vFloat vi, vFloat vj, vFloat vk )
{
return vsqrtf( vi * vi + vj * vj + vk * vk );
}
or maybe like this for more than four vertices at a time (usually
somewhat more efficient):
< to be continued >
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden
This email sent to email@hidden
________________________________________________________________
//////////////////
// LAMONTAGNE // GEOPHYSICS LTD
////////////////// GEOPHYSIQUE LTEE
115 Grant Timmins Dr.
Kingston ON Canada K7M 8N3
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden