Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: audio computation and feeder threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: audio computation and feeder threads

Subject: Re: audio computation and feeder threads
From: Bill Stewart <email@hidden>
Date: Sat, 1 Mar 2003 19:29:23 -0800

Kurt,

On Friday, February 28, 2003, at 08:10 PM, Kurt Bigler wrote:

I am trying to glean from all the information which has gone by on this list
what I need for a general plan for thread design in my app. I would like to
focus for the moment on the case of audio output only, when the audio is
being synthesised algorithmically. I would specifically like to include the
possibility of multiple IOProcs being active to support multiple output
devices running at once.

The topic of feeder threads has come up a lot, although I believe usually
this has been in connection with playing audio files.

I am trying to decide when it is a worthy thing to use a feeder thread in
connection with an IOProc thread. The following thoughts come to mind as I
try to put together my ideas on this, and I would appreciate feedback.

First of all, the mere fact of outputting synthesized audio apparently does
not in itself appear to constitute a reason for having a feeder thread. I
am assuming (though maybe I am wrong) that Apple's audio units do not have
any multi-thread decoupling/buffering going on in them - particularly audio
units that do synthesis from midi would be the issue here. Can I assume
that the DLS Synth (which I know _absolutely_ nothing about, yet need to use
here as an example) does its synthesis right in the IOProc thread? If yes,
then can I assume that this is therefore an "ok thing"?

Just to be clear on usage here (and some more details about the synth itself)

The DLS Synth (like actually, all of our audio units) does all of its work when AudioUnitRender is called, and on the thread that AURender is called on.

This typically is an I/O proc, but can be any thread - for instance when you want to write the data out to a file.

One of the features of the DLS Synth is that it gives you the ability to either run the reverb internally within the synth, or to generate two busses of output where one of the busses is the mix of audio that is "dry", the other is the mix of audio that is "wet" and should be taken through an external reverb. This is actually how we support multiple sound font or DLS files in a QTMusic type of playback - one reverb with a mixed input regardless of how many different instances of a synth we have.

As the DLSynth can output on two busses, it does all of the work the first time one of these busses are called (using the sample count and numFrames in the AudioUnitRender call to figure this out) - then when the second bus is called (say for the "wet" mix), its already done that work, and just passes the results out.

The DLS Synth also uses 2 properties (which we now support in the Generic UI unit as of 10.2.3) to allow the user to manage the CPU/Quality tradeoff that he/she wants to make.

CPULoad
- this is a value from 0 to 1, where 0 means no limitation - typically used when rendering to a file...
The Synth calculates this usage by taking a note of the time it is called, calculating the time that the numFrames and sample rate imply, then is free to do work based on that time. As it gets near the end of that time period it will decide to drop any remaining notes in its queue. (Those notes have previously been ordered by "newness" and loudness criteria)

Its not a perfect system of course, because it doesn't take account of other rendering activity that might be going on, but the user can use this number to adjust for its position in the rendering process, and how much should be left for other activities, so I think its sufficient.

RenderQuality
- Chris described this previously - it potentially does two things - changes the quality of an internal reverb (if present) - decide that a note should end at different dB levels (this property is also published and support by both the AUMatrixReverb, and by the AU3DMixer)

I think you should consider supporting both of those properties - this then allows the user to make decisions about your synths usage based on a particular situation.

So, I can think of several reasons to use a feeder thread (together with
appropriate buffering and consequent additional latency) to feed synthesized
audio to an IOProc thread:

Most if not all, major hosting apps discriminate between those tasks that have a reason to respond in real-time and those that don't (for instance the difference between responding to real-time events, versus those that are already recorded in a sequence)

A good example of this is Logic 6's new "freeze" function - using this you really care MOST about quality and really very little about how much time you take to generate the audio.

A good (perhaps the only) reason an AU should consider using its own threads are if there are processes within the AU that can be parallelised AND you have more than one CPU to execute on. Some also like to use threads as a way to maintain complex stack states, etc, so that's another good reason.

(1) to keep blocking calls out of the IOProc

You cannot assume that you are being called on the I/O Proc - in many real situations you may not be. That said, I do not think it is a good idea to be doing allocations in your AURender (because they can block), File I/O (because they can block) or any other blocking type of activity where you don't have a complete understanding of the potential duration of your waiting. Sometimes, these are unavoidable, but they should be totally understood, and very fine grained. I know many people like to use the atomic calls to avoid overly fine-grained blocking. But yes, I agree, nothing in your AudioUnitRender call should have an unbounded locking potential.

(2) to buffer against irregularities in the synthesis process, possibly
allowing early detection of a processing lag, allowing corrective responses
(e.g. reduced precision) to be applied graciously rather than having to be
applied instantly (e.g. note dropping)

Provided that you can actually be running in real time (or are expected to be usable within that context) of course. I think that's an admirable and probably an additional reason to use a second thread (so I'll add that to my list!)... (Nobody expects the spanish inquisition, we have...:)

(3) to buffer against irregularities in system performance, such as some
"randomness" in the scheduler together with the unpredictable nature of the
demands put on the scheduler

Hmmm... I think that's a bad reason for the AU at least. This really is a responsibility of the host and ultimately of the OS. It also implies an arbitrary introduction of latency that is generally undesirable.

(4) to buffer against much nastier things like thread-performance
inter-dependencies caused by caching issues. For example, suppose two
memory-hungry threads (possibly even two different IOProcs) happen to get
scheduled in parallel on 2 cpus, with the result that performance drops
sharply in both because of collisions in use of memory. It might have been
better if they had been scheduled serially on the same cpu. But I assume
there is nothing in the scheduler that could recognize and avoid these kinds
of situations, even heuristically.

Hmmm... I think this is a misuderstanding of how your AU *would* or *should* be used in that context.

We do not allow fan out connections in the AUGraph (ie. the situation where the same output bus of an AU is going to 2 or more different destinations)

We also do not allow for more than one output unit on an AUGraph - so by defination an AUGraph will only be processing audio for one device (because further we also don't have an output unit that will talk to more than one device)

So, we've made our life easier!

That said - we have considered introducing a buffering type of AU that could potentially deal with this type of situation - which of course is essentially what you are describing.

That kind of AU (which would introduce at least a buffer's worth of latency) would get its data from its sources through a single bus, in a single thread. It could have multiple output busses (say in the case you describe above) that would be providing data on those threads. The whole raison d'etre for this unit would be to deal with these kinds of threaded and buffering situations, with minimal need to use locking semantics.

Though we've considered this, we haven't completed and shipped an implementation of it because its never been quite done, and the usage scenario is pretty arcane and specialised. If we get alot of requests for this functionality we'd consider it.

(5) perhaps - to make good use of multiple processors on the system when
they are available. In this case I am not so much thinking of things like
cache problems, but rather how to balance the synthesis computations across
all available processors, while not interfering with IOProc performance.
For example I could spawn a bunch of synthesis feeder threads, possibly more
threads than there are processors, so that the scheduler is left with lots
of flexibility - in case other system loads can not be distributed evenly
between the processors, my own threads can take up the slack wherever it
exists.

Sure - this certainly holds out some good benefits for doing something like this - though also here, the host apps can to some extent take care of this kind of thing for you... I guess my main concern with doing this would be that in many common usage scenarios, you won't be the only processing that is going on, so the host maybe in a better situation to make this decision than a particular AU... I could imagine a situation where a user might be able to split their usage or you synth into different sources of control events, and thus instaniate multiple instances of you synth, which then the host could target to different CPUs..

(6) to get back that extra 10% of the 90% that the HAL will allow, as per
Jeff Moore's message of 2/26/03 6:02 PM:

The HAL has a hard limit of roughly 90% CPU usage as measured against
the deadlines it calculates internally before it will put you in the
penalty box and issue kAudioDevicePropertyOverload notifications (this

Yes - you have to give time for the drivers, etc to do their work. You can't really avoid leaving some time for this work to happen somewhere on some CPU.

Am I basically on the right track in my thinking here? Is that just about
it? Are there any other compelling reasons for using a feeder thread?

Item (5) is particularly of interest to me right now. I first tried
improving performance on a 2-processor machine with 2 output devices active
by doing synthesis independently in each IOProc thread. I found that in
some cases I get a 50% increase in performance, and in other cases no
reliable improvement at all. In particular my altivec-optimized synthesis
gets no reliable increase, and in fact sometimes a sharp drop. This is in
spite of attempts to keep memory use to a very small minimum, although I
can't prove that there were not many scattered accesses to memory that did
not happen to collide badly in their timing on 2 processors. Any not to
dwell on these details here.

So, if I am basically on the right track with my thinking, it is fair to say
that optimized audio synthesis directed towards multiple IOProcs should
probably always use feeder threads, if the goal is to be able to get as
close as possible to saturating the CPU?

And if so... would it be useful - and possible - to create one or more audio
units whose sole purpose is to decouple its pulling callback from its
rendering callback? Would such an audio unit make it possible for audio
developers to deal with a much broader set of requirments without having to
develop so much in-house expertise on thread management? I envision an
audio unit that permitted sufficiently flexible control over buffering and
latency (and thread priority?) that almost all audio application thread
management could be vastly simplified.

I'll take that as at least one vote for the "AUBufferUnit" - you certainly raise some interesting questions that are worth more thought.

Bill

Note that this technology would also make it possible for an output audio
unit that could span any number of output devices, possibly running at
different sample rates. Obviously I'm glossing over a lot of detail here -
but I'm actually hoping that someone at Apple will do this in which case all
the details will be take care of!

Thanks,
Kurt Bigler
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.

-- mailto:email@hidden
tel: +1 408 974 4056

________________________________________________________________________ __
"Much human ingenuity has gone into finding the ultimate Before.
The current state of knowledge can be summarized thus:
In the beginning, there was nothing, which exploded" - Terry Pratchett
________________________________________________________________________ __
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: audio computation and feeder threads
  - From: Kurt Bigler <email@hidden>

Prev by Date: Re: CFM AudioUnit
Next by Date: AudioXplorer
Previous by thread: Re: audio computation and feeder threads
Next by thread: Re: audio computation and feeder threads
Index(es):
- Date
- Thread