Re: Multi Channel FFT Audio Anaylzer + Question
Re: Multi Channel FFT Audio Anaylzer + Question
- Subject: Re: Multi Channel FFT Audio Anaylzer + Question
- From: Ian Ollmann <email@hidden>
- Date: Tue, 26 Jun 2007 14:02:27 -0700
A question to the list about the vDSP routines: How do these perform/
behave with Intel Macs? I learned they don't use the AltiVec
Processor anymore.
Does that result in any disadvantages, I mean in terms of processing
power? Thanks for any clues.
Just to provide a more direct version of the Accelerate.framework story:
On Intel, the functions should be vectorized to use SSE and later
revisions, as available on your hardware. Very little has changed on
the PowerPC side. Accelerate.framework still uses AltiVec on machines
that support AltiVec. We still support AltiVec in
Accelerate.framework, and add AltiVec code as new APIs are added.
We did not add too many new APIs for Leopard, this time around. The
vast majority of our time for the last 3 years has been retuning for
Intel, to make sure that our APIs meet performance expectations. (Most
parts of the OS can get away with making a few fixes and throwing a
compiler switch when transitioning to a new chip. We usually have to
rewrite from scratch, which takes a while with 7000 entrypoints.)
Happily, the PowerPC story hasn't changed much in that time, so there
hasn't been much to retune there. We did add a handful of new APIs to
vImage. These were vectorized for both SSE and AltiVec.
The vast majority of the Leopard work has been along a couple of fronts:
1) fixing Intel implementations to more closely match what PowerPC did
2) Rewriting Intel code for Intel Core 2. (Many factor of 2
performance wins here.)
3) fixing various bugs that affected usability on either platform
4) Some small amount of retuning for G5, in a couple of cases where
our Intel implementation happened to work better on G5 than the old G5
implementation did.
5) 64-bit tidyup (particularly the vDSP section, which wasn't 64-bit
for Tiger)
The most common reason to fall into scalar code is that your data is
not aligned properly. Check that first. Different parts of
Accelerate.framework require different levels of alignment.
Documentation on the specific part/function that you are working with
should say something about what is required to land in the vector
path. A few functions are vectorized for PowerPC and not Intel.
Usually, this happens because the algorithm requires some fancy
permute work (e.g. three channel RGB buffers) that can't be done
cheaply on Intel at the moment due to inadequate hardware permute
support. Finally, we have a lot of APIs, so there might also be one or
two left over on Intel that we missed. If you find one, file a bug.
In most cases, Intel Core 2 performance should be in the neighborhood
of G5 AltiVec performance. Intel Core performance is often half as
fast as Intel Core 2. The story changes depending on how closely
aligned the particular function is aligned with the instructions
available in the ISA, where AltiVec often has an edge, but not
always. Some functions benefit on Intel from larger caches. Either
platform can benefit from other difficult to predict factors like how
much sleep a particular engineer got on a particular day.
Ian Ollmann
Vector & Numerics Group
Core OS
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden