With a little work, you might be able to get it to go much faster than
4x scalar speed, especially where you can put the reciprocal square
root estimate to good use. There are some utils in Accelerate/vForce.h
to help you with math library type functions, but largely you're going
to need to write your own vector code in this problem area. There is
also a array-of-vertices (actually PlanarF pixels) matrix multiply
function in vImage that might work well. (Check to see if overhead is
a problem. It does some introspection for matrix zeros, which might
take time. It is really expecting to be handed a million pixels, not
24 vertices.) If not, then writing your own is fairly simple.
...and that is the crux of the problem. Once you go through the
monumental headache of rearchitecting your data structures to do it
the "Apple" way (that is, the SIMD-friendly way), the vector code is
really quite trivial to write. You can probably take advantage of the
quirks of your problem space to save a little work that we (Apple)
might otherwise have to do for a "general" library. Do you really need
the full precision reciprocal square root here, or is the estimate
just fine? We'd probably err on the side of correctness, if we didn't
provide two APIs. So, we ask ourselves, "How many people are really
going to use this?" (due to the hefty data reorganization
requirements), and if we provide it, is the solution really much
better than what you can do yourself with minimal additional effort?
I think the answer is generally "No", though there might be some case
to be made for saving some developer frustration on hard problems like
quats or small matrix inversion.
That is the story for AltiVec/SSE/Accelerate.framework, anyway...
A better choice in this problem space is often OpenGL, for which 3D
geometry is a core competency. Video cards are generally much better
at dealing with interleaved data formats, and rolling your own vertex
program is a lot less time consuming than writing vector code,
something like 10x. (This is just based on my observations about
productivity of engineers writing CoreImage filters vs. vImage
filters. I don't have any first hand experience with writing vertex
programs myself.) Certainly, having seen both, while the hand tuned
vector function might end up being 200-2000 lines long, the vertex
function for the same thing might be a dozen lines.
The risks are: You might not be able to tolerate non-standard (non-
IEEE-754) floating point arithmetic commonly found on GPUs. You might
not be able to tolerate different GPUs returning different results.
You may need a fully accurate vector math library and you don't want
to write one yourself. (OpenGL ARB, section 2.14.5 says that some
vertex program math instructions are only approximations, and may need
to only be accurate to 10 bits.) You might need more than single
precision. Potentially a significant problem, it could take too much
time to move the data to the video card (and maybe back), meaning that
even slow vector code or scalar code is better than waiting for data
to even arrive on the card. RAM can also be a bit limited on the card,
so there may be cases where large problem sizes are not well suited to
video cards.
In any case, there are no doubt a number of people here familiar with
the GPGPU / streams-computing scene who can better explain this avenue
than I can.
Ian
*though, there are exceptions. I'd like to say we inherited all of
them from externally defined libraries, but sadly sometimes the pit
traps of your own making are the hardest to see. Unfortunately, once
they are in, we can't remove them due to binary compatibility. We even
get to port them forward to new architectures and tune them as best we
can in the name of cross architecture uniformity, so that developers
can continue to fall into the same trap just like they did on PowerPC.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden