Re: vDSP_conv very slow?
Re: vDSP_conv very slow?
- Subject: Re: vDSP_conv very slow?
- From: Chris Johnson <email@hidden>
- Date: Thu, 8 Feb 2007 15:47:13 -0500
On Feb 8, 2007, at 3:15 AM, Cor Jansen wrote:
// add input samples to end of bufSignal
// first move bufIn contents
for (i=0 ; i<ROOMCORRECTION_FILTSIZE-1 ; i++) {
bufSignal[i] = bufSignal[nSampleFrames+i];
}
This is very bad. Ian Kemmish suggests:
3) Instead of making bufSignal[] jsut large enough, and copying
samples around on every call, make it a ring buffer twice (or more)
as big as necssaary, and gradually march through it. Then, you'll
only need to copy samples from the end to the beginning when it wraps
around. You may need to experiment to find the best size here - the
bigger the buffer is, the less copying you do, but the greater the
chance of taking a performance hit from cache misses.
I find you can either do a single looping buffer, the same size you'd
expect it to be:
count = your offset within the buffer;
if (count < 0 || count > ROOMCORRECTION_FILTSIZE) {count =
ROOMCORRECTION_FILTSIZE;}
//stay within the buffer size- sanity checks BEFORE accessing any
array value
buf[count] = newsample;
do stuff with buf[(count+whatever)% ROOMCORRECTION_FILTSIZE];
//stops things from running off the end of the buffer
count--;
but then you still have to check every operation in the loop for
overflow, which is where 'twice as big as necessary' comes in.
count = your offset within the buffer;
if (count < 0 || count > ROOMCORRECTION_FILTSIZE) {count =
ROOMCORRECTION_FILTSIZE;}
//stay within single buffer size- sanity checks BEFORE accessing
any array value
buf[count+ ROOMCORRECTION_FILTSIZE] = buf[count] = newsample;
do stuff with buf[(count+whatever);
//now if you overflow it's the same as if it had wrapped- notice
instead of accessing an array value of
//count+whatever and then having to mod it to stay in the array,
you just add count+whatever, knowing that
//the overflow area's going to be correct data. One extra data
assign, twice the buffer, but one less operation
//for every single sample in the kernel.
//Actually, I do a lot of stuff just straight-up hardcoded, not
even using loops for my kernels
count--;
Does that help? I do time-based convolution kernels this way, and
it's hard to get more CPU-intensive than that- going to have to learn
FFT code to do larger kernels, but these small kernels sound extra
nice with convolution-the-hard-way- but you have to think like a game
programmer to get the stuff to run efficiently. I've seen side-
scroller games use similar tricks for background images.
Chris Johnson
airwindows
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden