Re: AltiVec optimization in Cocoa?
Re: AltiVec optimization in Cocoa?
- Subject: Re: AltiVec optimization in Cocoa?
- From: Mike Vannorsdel <email@hidden>
- Date: Sun, 19 May 2002 20:53:25 -0600
On 5/19/02 8:12 PM, "Timothy Larkin" <email@hidden> wrote:
>
> To
>
> prevent this, beware of code like this:
>
>
>
> void computeData(void * someData)
>
> {
>
> if (hasAltiVec)
>
> doAltiVecCode(someData);
>
> else
>
> doScalarCode(someData);
>
> }
>
>
Actually, this is the standard method of handling the situation where code
>
could be run on either the G3 or the G4, assuming that no in-lining is being
>
done. As you say, what absolutely must be avoided is the situation where any
>
function that will execute on the G3 has any AltiVec instructions, since the
>
compiler usually emits VSAVE instructions at the start of that function
>
which will cause the G3 to crash.
Right, assuming no-inlining. If there's auto inlining, it's best to use
function pointers.
>
While optimization at the cache level can be done in Altivec, it is very
>
tricky because the details depend on the version of the G4 executing the
>
code, as well as other factors that are difficult to predict. Most
>
cache-line optimizations are done by trial and error.
Right, some G4's have different cache behavior. I believe 7440/7450s use
the L2 cache for fetches and not as a victim cache. This is different than
earlier revisions.
>
However, my experience has been that code can be made to execute 2 or 3
>
times faster even without worrying about cache lines, or any of the other
>
nasty hardware details. It is well worth exploring the Altivec instruction
>
set if your application has compute-bound bottle necks that are suitable for
>
parallel processing.
You can even use the vector pre-fetch streams to feed scalar code. There's
a lot there that can be used. However, you must always watch the alignment
problems. For instance, in this trivial piece of code:
float vals[4] = {1,2,3,4};
//some code
vector float doWork(void)
{
vector float vf1 = vec_ld(0, &vals[0]);
//work with vf1
}
vf1 may actually contain (vals[-2], vals[-1], vals[0], vals[1]) because vals
is not aligned on a 16byte boundary.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.