Michael Smith writes:
It is naive to assume, particularly in the context of a heavily
threaded system like MacOS, that thread:CPU affinity will yield
significant improvements here.
It is perhaps more naive to presume that it won't.
My 10GbE driver work makes me pretty familiar with the behavior
under load of the network stacks in MacOSX, Linux, Solaris,
and to a lesser Windows, all of which allow hinting.
If you take another OS (say Linux), and intstall it on a Mac Pro, the
only way to make it perform badly (like MacOSX) is to supply totally
*incorrect* hints (bind the irq handler to one socket, and the
benchmark to the other). And even then it does not perform as badly
as MacOSX does on average :( If I then boot into OSX and apply the
only sort of hinting I'm aware of -- disabling the second CPU package
-- I get "reasonable" performance (2x the CPU utilization of Linux,
and a hair less bandwidth).
If I had a way to bind the ithread, the dlil thread, and
the application to a CPU or set of CPUs, I could coerce
MacOSX into getting decent performance without the drastic step
of disabling a CPU core.
Since you guys are fond of Solaris these days (Dtrace, ZFS, etc), I
encourage you to take a look at the tools and APIs that Solaris
provides (psrset(1M), processor_bind(2), pset_bind(2).