site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com Michael Smith writes:
On Aug 31, 2007, at 10:42 PM, Terry Lambert wrote:
I'd actually like to see an outside researcher demonstrate benchmarks vs. this approach. Mac OS X doesn't run in virtual wire mode for interrupt delivery, so interrupts are always fielded by the same processor.
This is not necessarily the case.
On some PPC platforms, interrupts are always handled by a single CPU.
On the remainder, and on x86, there are several interrupt distribution schemes in play. The x86 algorithm is (as I recall) roughly to prefer CPUs that are awake, and of those prefer the CPU with the lowest APIC ID that is not currently servicing an interrupt.
As Drew points out, not a lot happens in interrupt context - the real work is left to the scheduler invoking the workloop thread - but the scheduler isn't terribly well off either. In the specific case of network data input, the scheduler would need to know which userland thread is currently blocked on (or will shortly read from) the socket to which the data that the network adatper has just received will be delivered. Never mind that no code has yet looked at this data, nor that there may be data for several sockets/threads to be delivered.
The *NIC* has looked at it, and can easily hash different connections to several MSI-X interrupt handlers, which are then each bound to different CPUs (or groups of CPUs).
Once it has worked this out, it needs to know which cache domain currently contains the working set for these thread(s), and assess the relative cost of moving them such that it can pick a domain in which to run the workloop thread as well as any network-stack internal threads or callouts. Sadly, Apple has not yet worked out a good interface between the scheduler and commonly-available crystal balls, and so this remains a difficult thing to do.
Microsoft has though. It is called "receive side scaling" or RSS. The cleverness is that the NIC and the host has the connections to the same (sets of) CPUs. Even in the absence of RSS, multiple MSI-X interrupt handlers, etc, you can get a decent approximation of a crystal ball by having the ability to statically bind a single interrupt and a set of hot threads to set of CPUs. Eg, the administrator is the crystal ball. Drew _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... This email sent to site_archiver@lists.apple.com