Re: multithreaded mixer
Re: multithreaded mixer
- Subject: Re: multithreaded mixer
- From: Jeff Moore <email@hidden>
- Date: Wed, 4 Feb 2004 11:57:54 -0800
On Feb 3, 2004, at 11:00 PM, Philippe Wicker wrote:
On Wednesday, February 4, 2004, at 12:56 AM, Jeff Moore wrote:
Phiippe presents an excellent case, but unfortunately, it isn't going
to work too well when you push the machine hard.
Sad news :((
The reason why is that there is no "politically correct" way to block
the IO thread. No matter how you do it, you are still suspending the
IO thread. This mangles the scheduler's notion of the behavior of the
thread. It will appear like it is running way more often than the
scheduling parameters indicate it should be running.
This can cause things to not be pre-emptible when they should be and
will cause timing problems with other real time threads, such as the
IO thread of another device or the threads the MIDI server and it's
clients use.
I can understand that a thread which is declared as "periodic"
(thread_time_constraint_policy_data_t.period != 0) may "confuse" the
scheduler if it does not wake up periodically (which is how the IO
thread will behave if it is blocked). I can also understand that if
one IO thread is blocked, the scheduler may yield the CPU to another
IO thread which may lead to problems. On the other hand I cannot
figure why it could have a negative impact on the MIDI threads. Could
tell us more about this?
The way it works is that the IO thread will not be pre-emptible for a
portion of it's cycle right after it wakes up. Then it becomes
pre-emptible by other real time threads. By constantly blocking the IO
thread, you are essentially putting it into a state where it won't ever
be pre-emptible. This will prevent the real time MIDI threads from
being able to sneak in for a few mics to do their thing and result in a
degradation of MIDI timing which will have the natural consequence of
making MIDI and audio go out of synch.
Also, let's not forget about the cost of constantly transitioning
between all these threads. This doesn't take a trivial amount of time,
especially since so much state (think about all the FPU and Altivec
registers that need to be saved on a G5) needs to be saved for each
transition. Depending on how you do things, you could easily wipe out
all your gains with managing the overhead.
In short, don't ever block the IO thread. It is number one on the
list of "bad things" you can do.
Again, sad news ;((
The question then is: is this scheduler the best one for the kind of
applications we are all writing?
It is and the fact that lots of apps out there (like Logic, DP, etc)
have already solved this problem is proof that things work if you
cooperate with the system and are flexible with how you approach the
solution to the problem.
The way it works - combined with the pull audio flow model - prevents
the efficient use of a multi-processor machine because the DSP work
will always be executed in one thread and therefore by one processor
at any given time. I know that sometimes the work can be delegated to
other threads (and other processor) without blocking the IO thread
(the feeder thread of a "ping-pong" file reader such as PlayAudioFile
is an example). This kind of "delegation" implies that the audio can
be fetched some amount of time before it will be processed by the IO
thread (one chunk ahead in the case of the file reader). This is
obviously not applicable when low latency are needed (e.g. for live
playing).
I disagree with this statement a bit. One doesn't have to add lots of
latency in order to do this. One can break the signal processing up in
a variety of ways in order to do the load balancing. For instance, in
most engines, not all the DSP really requires low latency.
Consequently, you can put just the bits that need low latency in the IO
thread and do the rest on an auxiliary thread. You can even vary this
stuff dynamically as the user changes focus in the UI. Another
technique is to run the IO thread at a rate much faster than the rate
at which you are doing the DSP. This will allow for shorter starting
latency for new events as they don't have to wait so long before they
can be mixed in. I'm sure there are lots of other techniques, too.
I also want to suggest a nightmare scenario: what if everybody's
AudioUnit did what you are suggesting? This is not an unreasonable
suggestion at first blush considering that some of the synths and
effects are pretty darn complicated and do a lot of work that wouldn't
be too hard to parallelize. This would bring the system to it's knees
with all the real time thread traffic and no one would get anything
done.
Another big wrench in the works is that if you have an engine that
shares it's effects, say through a send/return or side chain type
construct, you build up a data dependancy that will quickly prevent you
from ever breaking things up into more than one subgraph.
Also, let's not forget that in most applications, there is a whole lot
more going on than just DSP. There's UI to update, data to read from
the disk or the network, bookkeeping to do, MIDI data to read and
write, etc. Trying to split your DSP tasks up to distribute to all the
processors can easily get you into a situation where you just not able
to respond to the user or get the data from the disk that you need in a
timely fashion. This is another facet of why doing this won't
necessarily have the beneficial effect you think it will.
To go back to the original question (split the DSP work between
different threads), the only solution I can see which satisfies the
"don't ever block the IO thread" constraint is to make both subgraphs
run in their own thread and "push" the audio into a private ring
buffer. The code executed in the context of the IO thread has then to
pull the audio from both ring buffers, mix it and return it to the
HAL. When the IOProc is called, the code here could pass the IOProc
parameters to these two threads using 2 non blocking queues (one for
each thread). This solution adds a latency which may be as much as an
IOProc buffer size. The other bad news is that the synchronization of
the worker threads with the IOProc needs a mutex/condvar couple or a
mach semaphore. In both cases, a lock must be taken either by the IO
thread or the worker thread which may lead to block the IO thread
(although for a very short amount of time). It is true however that in
this particular case the probability of such a blocking is very low
and may be neglected (???).
This is a common misconception about how pthreads work. One doesn't
have to own the mutex associated with a condvar to signal it. I don't
use mach semaphores, so I can't comment on them. Also, there is no
reason why you have to use a signal to do this. You could just as
easily wake the thread up on the decrementer and thus save a bit of
overhead. After all, if it's good enough for the main IO thread, it
should be good enough for any auxiliaries, right?
Finally, there is no such thing as blocking for short amount of time.
Even if you make the timeout of your wait be a single host clock tick,
the scheduler won't get back to that thread until at least it's minimum
scheduling quantum has expired (what this is depends on the threads
that are running and their scheduling parameters) and it will be even
longer if there are other things that can't be pre-empted (like other
IO threads) going on. This can end up being a really long time on a
busy system.
--
Jeff Moore
Core Audio
Apple
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.