Re: Pinning IO Thread to a particular processor
Re: Pinning IO Thread to a particular processor
- Subject: Re: Pinning IO Thread to a particular processor
- From: Herbie Robinson <email@hidden>
- Date: Tue, 31 Jan 2006 19:22:27 -0500
We are observing that on a Quad G5 we get glitches or dropouts when we
lower the buffer size under high audio load (of course); but with all
four cores enabled, we get these glitches far earlier (with a higher
buffer size) than with only one core enabled.
1) Why is this?
It could be cache thrashing, but it could also mean you have lock
contention or priority inversion problems.
Cache thrashing happens when you have data from two or more threads
in the same cache line. If the threads are running at the same time,
the cache line can end up bouncing back and forth between CPUs every
time the data is accessed.
You can rule out interrupt latency.
2) Is there anything we can do about this? (Short of telling our users
to turn off three of their four cores.)
Cache thrashing can be cut down significantly by carefully organizing
shared data structures based on cache lines. Data for a particular
thread should be kept in a separate cache line (or set of lines).
Shared data that is read only should be in a separate set of cache
lines. Shared data that is modified frequently should be in its own
cache line. I/O buffers should be in separate cache lines. Maybe
FIFOs should be defined such that there is always at least one cache
line between the insert and remove points. In that past, I have seen
these techniques yield 10-20% improvements in aggregate performance
when applied to things like device drivers and I/O buffers. I don't
know how it affects latency. The same techniques applied to the OS
scheduler data structures improved performance by several orders of
magnitude.
Jeff has described other techniques.
We were thinking if it were possible to "pin" the HAL IO thread to a
particular processor/core, maybe that would help. Is this possible?
I would hope the OS is already doing that. Even with a small 512K
cache, this helps a lot -- and not just for real-time processes.
I haven't looked at OS X, but the typical algorithm is to try and
keep threads on the same CPU as long as the CPU is available within a
short period of time and the thread has run recently enough for it to
matter. A really cool algorithm would do static CPU allocation for
real-time threads based on the scheduling parameters -- such a thing
wouldn't be much use on a 2 CPU system, but it would definitely help
on a quad.
As far as providing an API to pin threads to a specific CPU, it is
useful for debugging and performance investigations, but very
dangerous to use on a production basis -- it is surprisingly easy to
wedge a system.
At 3:21 PM -0800 1/30/06, Shaun Wexler wrote:
Another issue the Quad might be susceptible to could be related to
the shared 2MB cache of each dual-core chip. It's possible that a
thread running on "Core B" could evict data from "Core A's" cache,
forcing its HAL thread to stumble out to main memory for every bit.
That would interfere in the single CPU case, too.
Worst-case scenario, it might also be paging, blocking the IOProc.
Maybe he's not wired-down when/where he needs to be? ;)
Again, that would be a problem single CPU, too, unless something in
MACH is getting contention between threads (pretty unlikely).
--
-*****************************************
** http://www.curbside-recording.com/ **
******************************************
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden