Re: kernel threads invalidating caches

20 Aug 2008

      site_archiver@lists.apple.com
Delivered-To: darwin-kernel@lists.apple.com

Dear Friends,
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a...
On Aug 20, 2008, at 6:26 AM, alejandro
<alejandro@openstudionetworks.com> wrote:

I am running very low latency real-time threads (for audio purposes)
on 6 cpus and other admin threads on the other 2 cpus, including
disk I/O threads.

When reading audio from disk, the real-time threads get stucked at
random times. I believe that this interaction is because the disk
read command launches a lower-half kernel thread that can run any of
the 8 cpus, and the memory movement from disk to RAM invalidates
their caches and breaks their real-time behavior.
Is there any solution? For example, prohibiting the kernel threads
to run on particular cpus or bound them to a cpu?

Basic queueing theory would suggest that you buffer more data so your
pool retention time exceeds any possible latency.  The problem then
goes away on its own, since then there is zero latency between when
you ask for data and when it comes available to you.
In general, Mac OS X does not run interrupts in IOAPIC virtual wire
mode, so no matter what you do, your I/O completion interrupts from
your disk I/O are going to come in to the BSP rather than one of the
APs.  The more APs you have, the more likely you will be taking the
low level completion interrupt on the BSP (the initial boot CPU) and
doing the upper level processing of the interrupt on an

AP (one of the other CPUs).
It turns out that this will typically have no effect on performance at
all, since the data will have been DMAed directly into main memory,
not CPU cache, and the only thing that will be "hot" in the cache on
the BSP will be the low level interrupt routine data, which is
typically limited to device data space (I/O space is/should be
uncached for obvious reasons), and the upper level device driver will/
should never touch that.
So any latency isn't going to be related to cache busting at that
level, and you've got a bad assumption about the source of your problem.
The most common cause of problems like this are trying to feed a RT
thread data from a non-RT or lower priority thread, with the higher
priority thread causing the lower priority one to starve.  Increasing
the pool size and reducing the priority to reasonable (or at least non-
starvation) levels works there, too.
Another common cause is thinking your code is the only thing running,
and thinking because you have 8 CPUs you should have 8 worker threads.
Besides the interrupt processing and the upper level device driver,
the system also has to think about your UI, etc., so sometimes
reducing worker threads actually makes performance better.
These are just possibilities, of course; if in doubt, use Shark to see
what's actually happening insttead of what you think should be
happening.
PS: JIT (Just In Time) is great for inventory control and for
interpreted languages, not so good for RT tasks on a deadline.  Best
bet is to line up all your ducks before any possible deadline.
This email sent to site_archiver@lists.apple.com

Re: kernel threads invalidating caches

Terry Lambert