Re: Any hints on optimizing my IO Proc for 128 channels ?
Re: Any hints on optimizing my IO Proc for 128 channels ?
- Subject: Re: Any hints on optimizing my IO Proc for 128 channels ?
- From: Michael Thornburgh <email@hidden>
- Date: Thu, 13 Oct 2005 10:03:44 -0700
hi Mark.
these things are obvious and you've probably done them already, but
Just In Case... :)
1) have you used "Shark" to find, on a per-line/per-instruction basis
where you're *really* spending the most time? Shark is an awesome
tool, and the best place to start on optimizing is where the profiler
tells you you're spending too much time.
2) do you have the compiler's optimizer enabled for "really i mean
totally the fastest" (-O3)?
-mike
On Oct 13, 2005, at 9:55 AM, Mark Gilbert wrote:
Folks.
I have a core audio app, which we are experimenting with 128
channel input (2x MADI cards under device aggregation).
Whilst the system is working fine with 64 channels, the extra load
of 128 channels is pushing our IO proc to breaking point and we are
sometimes taking too long, resulting in a lost input IOProc call.
Before you ask, I have tested the 128 channel aggregate with
another piece of software, and its working fine. They are
obviously doing things more efficiently than me.
My Core Audio IO Proc is not especially complex. Here is what it does
1) Pages through the various buffers and flattens them into a
single 'composite' buffer (for this IO call, so 512 samples or
whatever) which is interleaved 128 channel Float32s. This involves
a line of array mapping assigment which executes for every sample
assigning a sample from each buffer in turn to a place in our
composite buffer. We do this so we can have a uniform interleaved
approach for all audio devices, even ones with unusual buffer sets
with different numbers of channels.
2) It then copies from the 512 (typical) sample/ch composite buffer
into our main (large) circular buffer which may be as big as 30
seconds/ch. This copy is also a sample by sample, since we are
cross mapping (routing) channels from one input to a different
channel slot in the circular buffer. During this process we also
compare each level with a floating maximum to do our peak level
metering.
Part 2 is taking 2/3rds of the total time. If I bypass part 2, we
stop missing buffers since we go back into comfort zone on the
IOProc execution time.
That is basically all we are doing. As far as C code goes, its
probably pretty clean. There is nothing obvious I can see to
optimize it, so I was looking for pointers from those that know the
secrets of Float32 efficiency (all my buffers are Float32 *)
I have experimented a little with setting the device buffer size to
a larger value, but this doesn't seem to have helped much, and it
seems fairly hit and miss on reliability. Although the device
claims it will support a buffer of size x, I sometimes see the
device die when I set it, and requires a power down to fix, or the
systems gets unstable and I might see a kernal panic. Is a larger
buffer the answer ? or is this a waste of time
Does anyone have any general efficiency hints on IO Proc handling
in Core audio with large track counts?
Cheers
Mark Gilbert
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden