Re: audio computation and feeder threads
Re: audio computation and feeder threads
- Subject: Re: audio computation and feeder threads
- From: Bill Stewart <email@hidden>
- Date: Sat, 1 Mar 2003 19:29:23 -0800
Kurt,
On Friday, February 28, 2003, at 08:10 PM, Kurt Bigler wrote:
I am trying to glean from all the information which has gone by on
this list
what I need for a general plan for thread design in my app. I would
like to
focus for the moment on the case of audio output only, when the audio
is
being synthesised algorithmically. I would specifically like to
include the
possibility of multiple IOProcs being active to support multiple output
devices running at once.
The topic of feeder threads has come up a lot, although I believe
usually
this has been in connection with playing audio files.
I am trying to decide when it is a worthy thing to use a feeder thread
in
connection with an IOProc thread. The following thoughts come to mind
as I
try to put together my ideas on this, and I would appreciate feedback.
First of all, the mere fact of outputting synthesized audio apparently
does
not in itself appear to constitute a reason for having a feeder
thread. I
am assuming (though maybe I am wrong) that Apple's audio units do not
have
any multi-thread decoupling/buffering going on in them - particularly
audio
units that do synthesis from midi would be the issue here. Can I
assume
that the DLS Synth (which I know _absolutely_ nothing about, yet need
to use
here as an example) does its synthesis right in the IOProc thread? If
yes,
then can I assume that this is therefore an "ok thing"?
Just to be clear on usage here (and some more details about the synth
itself)
The DLS Synth (like actually, all of our audio units) does all of its
work when AudioUnitRender is called, and on the thread that AURender is
called on.
This typically is an I/O proc, but can be any thread - for instance
when you want to write the data out to a file.
One of the features of the DLS Synth is that it gives you the ability
to either run the reverb internally within the synth, or to generate
two busses of output where one of the busses is the mix of audio that
is "dry", the other is the mix of audio that is "wet" and should be
taken through an external reverb. This is actually how we support
multiple sound font or DLS files in a QTMusic type of playback - one
reverb with a mixed input regardless of how many different instances of
a synth we have.
As the DLSynth can output on two busses, it does all of the work the
first time one of these busses are called (using the sample count and
numFrames in the AudioUnitRender call to figure this out) - then when
the second bus is called (say for the "wet" mix), its already done that
work, and just passes the results out.
The DLS Synth also uses 2 properties (which we now support in the
Generic UI unit as of 10.2.3) to allow the user to manage the
CPU/Quality tradeoff that he/she wants to make.
CPULoad
- this is a value from 0 to 1, where 0 means no limitation - typically
used when rendering to a file...
The Synth calculates this usage by taking a note of the time it is
called, calculating the time that the numFrames and sample rate imply,
then is free to do work based on that time. As it gets near the end of
that time period it will decide to drop any remaining notes in its
queue. (Those notes have previously been ordered by "newness" and
loudness criteria)
Its not a perfect system of course, because it doesn't take account of
other rendering activity that might be going on, but the user can use
this number to adjust for its position in the rendering process, and
how much should be left for other activities, so I think its sufficient.
RenderQuality
- Chris described this previously - it potentially does two things -
changes the quality of an internal reverb (if present) - decide that a
note should end at different dB levels (this property is also published
and support by both the AUMatrixReverb, and by the AU3DMixer)
I think you should consider supporting both of those properties - this
then allows the user to make decisions about your synths usage based on
a particular situation.
So, I can think of several reasons to use a feeder thread (together
with
appropriate buffering and consequent additional latency) to feed
synthesized
audio to an IOProc thread:
Most if not all, major hosting apps discriminate between those tasks
that have a reason to respond in real-time and those that don't (for
instance the difference between responding to real-time events, versus
those that are already recorded in a sequence)
A good example of this is Logic 6's new "freeze" function - using this
you really care MOST about quality and really very little about how
much time you take to generate the audio.
A good (perhaps the only) reason an AU should consider using its own
threads are if there are processes within the AU that can be
parallelised AND you have more than one CPU to execute on. Some also
like to use threads as a way to maintain complex stack states, etc, so
that's another good reason.
(1) to keep blocking calls out of the IOProc
You cannot assume that you are being called on the I/O Proc - in many
real situations you may not be. That said, I do not think it is a good
idea to be doing allocations in your AURender (because they can block),
File I/O (because they can block) or any other blocking type of
activity where you don't have a complete understanding of the potential
duration of your waiting. Sometimes, these are unavoidable, but they
should be totally understood, and very fine grained. I know many people
like to use the atomic calls to avoid overly fine-grained blocking. But
yes, I agree, nothing in your AudioUnitRender call should have an
unbounded locking potential.
(2) to buffer against irregularities in the synthesis process, possibly
allowing early detection of a processing lag, allowing corrective
responses
(e.g. reduced precision) to be applied graciously rather than having
to be
applied instantly (e.g. note dropping)
Provided that you can actually be running in real time (or are expected
to be usable within that context) of course. I think that's an
admirable and probably an additional reason to use a second thread (so
I'll add that to my list!)... (Nobody expects the spanish inquisition,
we have...:)
(3) to buffer against irregularities in system performance, such as
some
"randomness" in the scheduler together with the unpredictable nature
of the
demands put on the scheduler
Hmmm... I think that's a bad reason for the AU at least. This really is
a responsibility of the host and ultimately of the OS. It also implies
an arbitrary introduction of latency that is generally undesirable.
(4) to buffer against much nastier things like thread-performance
inter-dependencies caused by caching issues. For example, suppose two
memory-hungry threads (possibly even two different IOProcs) happen to
get
scheduled in parallel on 2 cpus, with the result that performance drops
sharply in both because of collisions in use of memory. It might have
been
better if they had been scheduled serially on the same cpu. But I
assume
there is nothing in the scheduler that could recognize and avoid these
kinds
of situations, even heuristically.
Hmmm... I think this is a misuderstanding of how your AU *would* or
*should* be used in that context.
We do not allow fan out connections in the AUGraph (ie. the situation
where the same output bus of an AU is going to 2 or more different
destinations)
We also do not allow for more than one output unit on an AUGraph - so
by defination an AUGraph will only be processing audio for one device
(because further we also don't have an output unit that will talk to
more than one device)
So, we've made our life easier!
That said - we have considered introducing a buffering type of AU that
could potentially deal with this type of situation - which of course is
essentially what you are describing.
That kind of AU (which would introduce at least a buffer's worth of
latency) would get its data from its sources through a single bus, in a
single thread. It could have multiple output busses (say in the case
you describe above) that would be providing data on those threads. The
whole raison d'etre for this unit would be to deal with these kinds of
threaded and buffering situations, with minimal need to use locking
semantics.
Though we've considered this, we haven't completed and shipped an
implementation of it because its never been quite done, and the usage
scenario is pretty arcane and specialised. If we get alot of requests
for this functionality we'd consider it.
(5) perhaps - to make good use of multiple processors on the system
when
they are available. In this case I am not so much thinking of things
like
cache problems, but rather how to balance the synthesis computations
across
all available processors, while not interfering with IOProc
performance.
For example I could spawn a bunch of synthesis feeder threads,
possibly more
threads than there are processors, so that the scheduler is left with
lots
of flexibility - in case other system loads can not be distributed
evenly
between the processors, my own threads can take up the slack wherever
it
exists.
Sure - this certainly holds out some good benefits for doing something
like this - though also here, the host apps can to some extent take
care of this kind of thing for you... I guess my main concern with
doing this would be that in many common usage scenarios, you won't be
the only processing that is going on, so the host maybe in a better
situation to make this decision than a particular AU... I could imagine
a situation where a user might be able to split their usage or you
synth into different sources of control events, and thus instaniate
multiple instances of you synth, which then the host could target to
different CPUs..
(6) to get back that extra 10% of the 90% that the HAL will allow, as
per
Jeff Moore's message of 2/26/03 6:02 PM:
The HAL has a hard limit of roughly 90% CPU usage as measured against
the deadlines it calculates internally before it will put you in the
penalty box and issue kAudioDevicePropertyOverload notifications (this
Yes - you have to give time for the drivers, etc to do their work. You
can't really avoid leaving some time for this work to happen somewhere
on some CPU.
Am I basically on the right track in my thinking here? Is that just
about
it? Are there any other compelling reasons for using a feeder thread?
Item (5) is particularly of interest to me right now. I first tried
improving performance on a 2-processor machine with 2 output devices
active
by doing synthesis independently in each IOProc thread. I found that
in
some cases I get a 50% increase in performance, and in other cases no
reliable improvement at all. In particular my altivec-optimized
synthesis
gets no reliable increase, and in fact sometimes a sharp drop. This
is in
spite of attempts to keep memory use to a very small minimum, although
I
can't prove that there were not many scattered accesses to memory that
did
not happen to collide badly in their timing on 2 processors. Any not
to
dwell on these details here.
So, if I am basically on the right track with my thinking, it is fair
to say
that optimized audio synthesis directed towards multiple IOProcs should
probably always use feeder threads, if the goal is to be able to get as
close as possible to saturating the CPU?
And if so... would it be useful - and possible - to create one or more
audio
units whose sole purpose is to decouple its pulling callback from its
rendering callback? Would such an audio unit make it possible for
audio
developers to deal with a much broader set of requirments without
having to
develop so much in-house expertise on thread management? I envision an
audio unit that permitted sufficiently flexible control over buffering
and
latency (and thread priority?) that almost all audio application thread
management could be vastly simplified.
I'll take that as at least one vote for the "AUBufferUnit" - you
certainly raise some interesting questions that are worth more thought.
Bill
Note that this technology would also make it possible for an output
audio
unit that could span any number of output devices, possibly running at
different sample rates. Obviously I'm glossing over a lot of detail
here -
but I'm actually hoping that someone at Apple will do this in which
case all
the details will be take care of!
Thanks,
Kurt Bigler
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.
--
mailto:email@hidden
tel: +1 408 974 4056
________________________________________________________________________
__
"Much human ingenuity has gone into finding the ultimate Before.
The current state of knowledge can be summarized thus:
In the beginning, there was nothing, which exploded" - Terry Pratchett
________________________________________________________________________
__
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.