Re: long FIR filters
Re: long FIR filters
- Subject: Re: long FIR filters
- From: Ian Ollmann <email@hidden>
- Date: Fri, 2 Feb 2007 11:02:21 -0800
With a 1 GHz processor I will need about one multiply-add per clock.
I suppose the Intel processor of the apple computers has some kind
of Altivec (MMX?) architecture to do several
multiple-adds each clock? Am I correcect?
Does anybody know how to work this out? Preferrable without writing
assembly instructions.
The intel architecture has separate multiplier and adder that can
work concurrently. So even with scalar code you can (in theory!) get
one multiply and one add per clock. Of course the difference between
theory and practice is that in practice theory doesn't always work.
The Intel Core 2 vector unit has a theoretical throughput limit of 4
multiplies and 4 adds per cycle and should be able to keep up with
AltiVec (in theory!). The older Intel Core vector unit is much weaker
because it does its 128-bit operations in two chunks, 64-bits at a
time. This means it runs half as fast in addition to its other
limitations (reduced decode bandwidth, reduced issue bandwidth,
etc.) Thus, the theoretical max for vector code on a Intel Core
vector unit is (2 multiplies + 2 adds ) per cycle for single
precision. For a Intel Core 2, it would be (4 multiplies + 4 adds)
per cycle. However, from experience, I will tell you that you are
still going to need to do some serious work to actually realize
anywhere near that sort of throughput because of the cost of
accessing the data, which is non-trivial in this case.
...which is why we recommend calling the library routine.
Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden