Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Poor performance of pthreads across cpu dies.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Poor performance of pthreads across cpu dies.

Subject: Re: Poor performance of pthreads across cpu dies.
From: Michael Smith <email@hidden>
Date: Thu, 30 Aug 2007 21:45:49 -0700

Without wanting to rehash decades worth of research and hundreds of papers and experiments on the topic that comprise the body of experience on this subject, a couple of points are worth making.

On Aug 30, 2007, at 12:04 PM, email@hidden wrote:

I've seen similar issues benchmarking 10GbE NICs, and I don't even
need pthreads.  The scheduler tends to run the user mode application
on one core, the interrupt handler kernel thread (iokit "workloop") on
another, and the network stack (dlil) kernel thread on yet another.

In and of itself, this is not an issue. More of a problem is that, in many cases, the threads you have noted above don't stay in the same cache domain. This *is* an issue.

II think the fundamental problem is the scheduler doesn't have a clue
about cpu affinity, and MacOSX is lacking any APIs or command line
interfaces that would allow the app or admin to give it a clue (like
you can on Linux, Solaris, etc).

It is naive to assume, particularly in the context of a heavily threaded system like MacOS, that thread:CPU affinity will yield significant improvements here.

The critical issue, as I note above, is not CPU affinity but rather cache affinity, and more specifically data:cache affinity, since the penalty is not cache load times as the threads move around (since the caches in question are typically large enough to hold the thread working set for each thread in play), but snoop/flush/reload time as dirty data moves (slowly) from one cache domain to another as it is handed off from one thread to the next.

Good would be a scheduler with some
notion of CPU affinity, and better would be a scheduler that allowed
the user to give it some hints.

History suggests that this sort of hinting is a mixed blessing, often generating more problems than it solves (c.f. the NT stack binding vs. user comprehension for example).

The real challenge involves solving, for some reduced set of circumstances, the very difficult but related questions:

o What is the system going to do next? (e.g. should I schedule a thread that has just become runnable, or wait in the hope that the current thread will block soon?)

o Who will want this data (that I have not yet examined or begun to process) next, and where are they now? (e.g. should I move this current thread to a different cache domain so that the consumer will not have to snoop it over?)

In some restricted, single-activity cases, brute-force hinting approaches can help with the above. Sadly, hinting works much less well once you have more than one source of hints, or more than one topology in play, or a topology that is not understood by the hinter; an adaptive and automatic solution is much more attractive in that case.

I don't think there's any disagreement that the current situation isn't very good; merely that the proposed 'solutions' fall fairly well short and that something better is really needed.

 = Mike

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: Poor performance of pthreads across cpu dies.
From: Andrew Gallatin <email@hidden>


Prev by Date:
Re: Getting LOGNAME when using sudo

Next by Date:
IP address change events?

Previous by thread:
Re: Poor performance of pthreads across cpu dies.

Next by thread:
Re: Poor performance of pthreads across cpu dies.

Index(es):

Date
Thread