Re: Semaphored tasks not scheduling efficiently
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com On May 28, 2008, at 12:04 PM, darwin-dev-request@lists.apple.com wrote: Russ, = Mike _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... I am optimizing a multithreaded app using shark on an 8-core machine. There is a typical MPCreateSemaphore/MPSignalSemaphore semaphore-based setup that hands out work units to be processed to a set of worker threads, and it is is working... but when you look carefully with Shark you see that a fair portion of the time, only 7 cores are busy. As the work units are being released, the cores start up --- but under close examination sometimes one will start only for 2 usec or less, then stop with semaphore_timedwait_trap. It then goes away for a long time (multiples of the 10msec os thread slice) and the mach kernel shows that there is an idle thread running. Sometimes the 'hole" in utilization moves to a different core. It winds up biting twice because not only do you lose a core, but you wind up with a ragged inefficient finish that wastes 6 or 7 cores. All this repeats, giving me a swiss-cheese look in what should logically be a very dense shark system trace. All the work eventually gets done and it works, but not as well as it should. I saw some notes on some mach semaphores that don't make the target task immediately runnable, not sure if that's what's happening here, but once the main thread goes to sleep after starting the workers, the eighth runnable worker thread should surely start, I'd think. Can anyone give insight into the MP semaphore and scheduling internals? Thanks. I'm not aware of anything that would cause the symptoms you're seeing that wasn't a result of contention; in the case where you have a core idle, what is your worker thread blocking on? "semaphore_timedwait_trap" is the mach incall that blocks a thread waiting for a semaphore wakeup, so you are blocking "legitimately". You might consider instrumenting your blocking points to see whether you spend an unexpectedly long time blocked at any given point - it sounds as though you expect to be running 8 threads non-stop, so it should be fairly straightforward to tell the difference between a legitimate and non-legit block. You should also check to make sure you aren't triggering any page-in/ out activity, as "tens of ms" is consistent with disk/network I/O timing. If you are handling multiple work units in a given run, it sounds like you might have a race between the work provider and consumers leading to a unit stuck in the queue. In the case where you have a thread asleep, have you checked the state of your work units? One final note; in a loosely-affine system like Darwin, I would tend to encourage a worker pool of N+(N/4) or so to ensure maximal saturation in the face of sporadic but protracted contention. This does have a tendency to give you a spiky tail on your saturation plot (since if contention is low you have a chunk of work remaining at the end that runs at about 1/4 saturation) so consider it no more than a starting point. Note that increasing the thread count into the 2-4N range can have useful effects on saturation depending on the nature of your workload, so do try moving both down and up. Obviously if you expect no contention, N is a good place to be. This email sent to site_archiver@lists.apple.com
participants (1)
-
Michael Smith