Re: audio computation and feeder threads
Re: audio computation and feeder threads
- Subject: Re: audio computation and feeder threads
- From: Kurt Bigler <email@hidden>
- Date: Fri, 28 Feb 2003 22:57:35 -0800
on 2/28/03 8:24 PM, Lionel Woog <email@hidden> wrote:
>
Typically, an IOProc will be called with a 512 frames request. If you can
>
generate that much in or close to real time you are fine. If you cannot
>
always do it (i.e. You generate 8K frames at a time, so the lag to provide
>
the first 512 frames is large), then you need a feeder thread that
>
pre-computes and buffers the data.
The code can generate as many frames as you like at a time. As it stands,
the synthesis code runs in each IOProc thread, and generates all the frames
requested. So this kind of lag is not the issue currently.
The issue is that I can use up an arbitrary amount of CPU time in the
synthesis. The better the synthesis the more CPU time it takes. I want to
have the best synthesis possible that allows an adequate number of
simultaneous oscillators, and I don't want any part of the process to be
avoidably "wasteful", nor do I want to be using any less than the maximum
fraction of the total CPU that is conceivably available. Other apps running
is not an issue, because this the system is dedicated to the application,
except to the extent that there are processes that I can not stop which will
always be using up a little bit of the cpu.
So for example, doing all my synthesis without any special effort to "make
use of" the 2nd cpu in my own code leaves me able to syntesize 300 or so
voices (oscillators) simultaneously, at a quality level that I have
currently chosen as "adequate". This is "ok" but 500 voices would be much
better. So I want to rethread my synthesis so the scheduler will be more
likely to run it in whatever time might otherwise be idle on both CPUs.
Doing the synthesis for 2 devices in their respective 2 ioprocs did not end
up giving me a significant advantage, and I need to find out if there is
something I can do to improve this, or if for example I have hit some limit
of memory/cache bandwidth for the algorithm I am using. (The answer is not
to buy a new computer, because there is no fixed limit on what is enough.
Any efforts to improve things now will greatly improve the value of the next
computer I would buy.)
So in short, I need to use all of the CPU I can get hold of to optimize my
current situation given the equipment I currently have. There are several
possible target areas for improvement, and one of them is to make better use
of threads. Any efforts put there that "solve the problem" in a general way
make it possible to avoid doing specific optimizations of specific synthesis
methods that I might abandon next month.
I figure getting the thread management right is part of getting my basic
development platform right - it opens up all the other possibilities.
The other main point in using a feeder thread approach is to smooth out the
rough spots. If I can generate X oscillators in real time "on the average"
but various "random" factors (such as scheduler behavior, and other running
processes) make it impossible to achieve that rate continuously, then I have
to add some buffering slop so that the potential "average" rate can be
realized in a glitch-free way.
Come to think of it, that kind of buffering can be done without introducing
another thread (i.e. still within the IOProc), so the compelling reasons for
a feeder thread may be the the issues of multiple cpus, multiple IOProcs,
and the HAL 90% issue.
Hope that clarifies some.
-Kurt Bigler
>
>
I believe that you will find core audio very able at keeping itself fed.
>
>
> I am trying to glean from all the information which has gone by on this list
>
> what I need for a general plan for thread design in my app. I would like to
>
> focus for the moment on the case of audio output only, when the audio is
>
> being synthesised algorithmically. I would specifically like to include the
>
> possibility of multiple IOProcs being active to support multiple output
>
> devices running at once.
>
>
>
> The topic of feeder threads has come up a lot, although I believe usually
>
> this has been in connection with playing audio files.
>
>
>
> I am trying to decide when it is a worthy thing to use a feeder thread in
>
> connection with an IOProc thread. The following thoughts come to mind as I
>
> try to put together my ideas on this, and I would appreciate feedback.
>
>
>
> First of all, the mere fact of outputting synthesized audio apparently does
>
> not in itself appear to constitute a reason for having a feeder thread. I
>
> am assuming (though maybe I am wrong) that Apple's audio units do not have
>
> any multi-thread decoupling/buffering going on in them - particularly audio
>
> units that do synthesis from midi would be the issue here. Can I assume
>
> that the DLS Synth (which I know _absolutely_ nothing about, yet need to use
>
> here as an example) does its synthesis right in the IOProc thread? If yes,
>
> then can I assume that this is therefore an "ok thing"?
>
>
>
> So, I can think of several reasons to use a feeder thread (together with
>
> appropriate buffering and consequent additional latency) to feed synthesized
>
> audio to an IOProc thread:
>
>
>
>
>
> (1) to keep blocking calls out of the IOProc
>
>
>
> (2) to buffer against irregularities in the synthesis process, possibly
>
> allowing early detection of a processing lag, allowing corrective responses
>
> (e.g. reduced precision) to be applied graciously rather than having to be
>
> applied instantly (e.g. note dropping)
>
>
>
> (3) to buffer against irregularities in system performance, such as some
>
> "randomness" in the scheduler together with the unpredictable nature of the
>
> demands put on the scheduler
>
>
>
> (4) to buffer against much nastier things like thread-performance
>
> inter-dependencies caused by caching issues. For example, suppose two
>
> memory-hungry threads (possibly even two different IOProcs) happen to get
>
> scheduled in parallel on 2 cpus, with the result that performance drops
>
> sharply in both because of collisions in use of memory. It might have been
>
> better if they had been scheduled serially on the same cpu. But I assume
>
> there is nothing in the scheduler that could recognize and avoid these kinds
>
> of situations, even heuristically.
>
>
>
> (5) perhaps - to make good use of multiple processors on the system when
>
> they are available. In this case I am not so much thinking of things like
>
> cache problems, but rather how to balance the synthesis computations across
>
> all available processors, while not interfering with IOProc performance.
>
> For example I could spawn a bunch of synthesis feeder threads, possibly more
>
> threads than there are processors, so that the scheduler is left with lots
>
> of flexibility - in case other system loads can not be distributed evenly
>
> between the processors, my own threads can take up the slack wherever it
>
> exists.
>
>
>
> (6) to get back that extra 10% of the 90% that the HAL will allow, as per
>
> Jeff Moore's message of 2/26/03 6:02 PM:
>
>
>
>> The HAL has a hard limit of roughly 90% CPU usage as measured against
>
>> the deadlines it calculates internally before it will put you in the
>
>> penalty box and issue kAudioDevicePropertyOverload notifications (this
>
>
>
>
>
> Am I basically on the right track in my thinking here? Is that just about
>
> it? Are there any other compelling reasons for using a feeder thread?
>
>
>
> Item (5) is particularly of interest to me right now. I first tried
>
> improving performance on a 2-processor machine with 2 output devices active
>
> by doing synthesis independently in each IOProc thread. I found that in
>
> some cases I get a 50% increase in performance, and in other cases no
>
> reliable improvement at all. In particular my altivec-optimized synthesis
>
> gets no reliable increase, and in fact sometimes a sharp drop. This is in
>
> spite of attempts to keep memory use to a very small minimum, although I
>
> can't prove that there were not many scattered accesses to memory that did
>
> not happen to collide badly in their timing on 2 processors. Any not to
>
> dwell on these details here.
>
>
>
>
>
> So, if I am basically on the right track with my thinking, it is fair to say
>
> that optimized audio synthesis directed towards multiple IOProcs should
>
> probably always use feeder threads, if the goal is to be able to get as
>
> close as possible to saturating the CPU?
>
>
>
> And if so... would it be useful - and possible - to create one or more audio
>
> units whose sole purpose is to decouple its pulling callback from its
>
> rendering callback? Would such an audio unit make it possible for audio
>
> developers to deal with a much broader set of requirments without having to
>
> develop so much in-house expertise on thread management? I envision an
>
> audio unit that permitted sufficiently flexible control over buffering and
>
> latency (and thread priority?) that almost all audio application thread
>
> management could be vastly simplified.
>
>
>
> Note that this technology would also make it possible for an output audio
>
> unit that could span any number of output devices, possibly running at
>
> different sample rates. Obviously I'm glossing over a lot of detail here -
>
> but I'm actually hoping that someone at Apple will do this in which case all
>
> the details will be take care of!
>
>
>
>
>
> Thanks,
>
> Kurt Bigler
>
> _______________________________________________
>
> coreaudio-api mailing list | email@hidden
>
> Help/Unsubscribe/Archives:
>
> http://www.lists.apple.com/mailman/listinfo/coreaudio-api
>
> Do not post admin requests to the list. They will be ignored.
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.