Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Discover multiple processors programmatically?



On Dec 20, 2005, at 4:25 PM, Ivan S. Kourtev wrote:
Thanks, Terry. That's all very good advice and I'll take it into consideration.

So now my code knows how to find out that the system it is running on has N > 1 processors. Say, for example, it is running on a system with N = 2 processors so it wants to use a two-thread version of some heavily computational routine. This is only going to be efficient if the two threads actually use different physical processes.

I think you meant processors?

The answer to that is actually more tricky than you'd expect. It really comes down to your cache locality, and how independently or interdependently your threads operate. For some applications, you would want negaffinity - what you are describing here as wanting to run on physically separate CPUs - while for others this would result in significant cache busting and IPIs for extensive TLB shootdown, if two or more threads were modifying data in the same locality. For those, you'd want strong affinity, maybe even deciding to run them on the same ALU in an SMT system in an idealized control-all-aspects-of- scheduling implementation.


So here a few interesting questions arise:

1. Is there any way to specify that threads run on physically different processors/cores? I know the OS is supposed to be smart but maybe not that smart because the thread-creating code isn't aware of the context of the program being computed.

The OS has several ways to do this, but it is only used internally, not exported for use by users. The main reasoning behind this is (1) power management makes it a requirement that we be able to bring up and shutdown resources as necessary to meet minimum requirements for system load and (2) there are a lot of RT and RT-like tasks that MacOS X must support, and in order to do that, the system needs to be in charge of resource allocation (consider binding a thread to a processor that the system decides needs to be shut down for whatever reason).


In addition, the central routine is not generally well protected against shooting your foot off, hence they are not exported.

It's possible for you to write a KEXT which would allow you to call the routine yourself (it's pretty obvious what it is, if you look through the scheduler code on OpenDarwin.org), but it's highly inadvisable. This is truly one place where the system probably knows better than you. Minimally, you would need to recompile your KEXT every point release - effectively, every software update - to keep it from going stale, since to get at the symbol set, you'd need to link directly instead of using a subset, which would tie you pretty tightly to a given kernel.


2. Particularly if the answer to the above question is YES, is it guaranteed that a thread will spend its entire life within the physical processor it first started on? I am not terribly familiar with the low-level stuff but, when a thread has exhausted its time slice, could it be scheduled on a different physical processor the next time around?

If you force the affinity, the affinity sticks until you force it off. Period. But as I said, it's definitely not recommended, and will likely shoot your foot off.



3. Depending on what the answers of 1. and 2. are, it seems to be it would be useful to have a mechanism for "locking" a thread to a processor? Sort of to make the maximum use of the available hardware? Particularly in the case when a process wants to start M threads where M <= N available processors.


Not really.

If you are using system frameworks, you really can't guarantee your locality, and if you are writing your own stuff, even then, you really can't guarantee your locality well enough for a given piece of hardware, without tying yourself to that particular piece of hardware forever.

Consider the case where you tune your code for 64M vs. 128M vs. whatever of cache, or for 32 or 64 or whatever TLB slots; that code really isn't going to run very well if you end up running it on any other hardware and go over some local hardware limit that wasn't on the machine you brought it up on.

Most people who use threads instead of finite state automatons as their programming model tend to actually not get the separation between threads strong enough that they never contend between themselves for resources, never need to use thread IPC, never need to use a mutex, etc..


I don't think my specific situations are very complicated but am trying to use the opportunity to teach myself a programming style which is certainly very new to me. My situations involve mostly high-complexity matrix and graph computations that can be partitioned and parallelized easily. Nothing extra fancy (like data- races, and the necessity for communication, locking, etc.). It's pretty safe to say that the N parallel threads can proceed full speed ahead -- depending on what hardware is available -- I basically need a lot of computational cycles available.

This is really the exception, rather than the rule.

Obviously, you can build a KEXT and experiment with calling it to make the calls by proxy to establish particular CPU affinity for your threads, but I don't think you are going to get a significant performance win in doing so. It may be that you don't have an iTunes or a CD burner or a DVD player or other RT task running on this system the same time as your calculation which could be damaged by binding threads to particular CPUs (and you've turned down/off power management so it doesn't interfere with your calculations); obviously, experimenting is fine - let us know how things turn out. 8^).


Finally, I am particularly interested in knowing the answers for Mac OS X, but it would be lovely to know what the POSIX ways (if any) are of doing these things.


IMO, POSIX has consistently specifically gone out of its way to avoid addressing this issue. The closest it gets to anything that could be abused to get this behaviour is if the user specifies a scheduling policy of SCHED_OTHER, and permitting other policies to be defined by the implementation as the second parameter to sched_setscheduler().

In effect, there's no standards-conformant way of doing this that isn't implementation defined. I think this is because the jury is still out on whether a program can be built to run on a general purpose OS, and still understand the hardware and system load characteristics enough to make effective informed decisions that are enough of a performance win to want the APIs everywhere. If someone was demonstrating a 50% performance improvement, you could bet some set of APIs would be implemented everywhere in short order to take advantage of the win.

I expect that at some point POSIX will revisit the issue, and maybe introduce *optional* APIs that let you deal with these things; I expect that they would likely go into sched_setscheduler() and sched_setparam(), etc., rather than adding new entry points into the system.


-- Terry


Thanks for any pointers.

Cheers,

--
ivan


On Dec 19, 2005, at 10:10 PM, Terry Lambert wrote:

On Dec 18, 2005, at 9:58 PM, Eric Albert wrote:
On Dec 18, 2005, at 5:45 PM, Ivan S. Kourtev wrote:

First, I tried sysconf() but it doesn't seem capable of doing what I need under Mac OS? The two variables
_SC_NPROCESSORS_CONF and _SC_NPROCESSORS_ONLN are undefined in unistd.h -- am I missing something?

They don't seem to be defined on Mac OS X. I'd suggest filing a bug report with Apple (<http://bugreport.apple.com>) if you'd like to see them added.

They are non-standard extensions tot he sysconf namespace. They are unlikely to be included even if a bug report is filed, since that particular namespace belongs to the standards committee; if they come up with the same name that meant something else, we wouldn't be able to implement it correctly because of binary backward compatibility issues, so it's better if we don't add it.


The reason it's in the man page is that our man page is cribbed from FreeBSD, and FreeBSD implements them. Our manual page there is fairly out of data, but man page fixes are unlikely to make it into a software update, for various reasons. The authoritative reference is the contents of unistd.h.

The only ones you can actually use portably between platforms are the ones defined by POSIX (assuming your other platforms are POSIX compliant).

See also <http://www.opengroup.org/onlinepubs/009695399/functions/sysconf.html >.


I also looked into sysctl as per Daniel's and Eric's suggestions -- I noticed even the sysconf manpage suggests that the sysctl interface is much richer. On Mac OS X, I got some code working right away (attached at end) but I haven't figured out how to get it to go under redhat (everything I do must work under both Mac OS X and redhat at least). redhat has a sys/sysctl.h but it only contains the declaration of sysctl() and none of the keywords. I realize this may be a little off-topic, but any clues?

This sounds like a great job for a configure script. This is hardly the only difference between Mac OS X and Linux. :) Another alternative is to do something like
#ifdef HW_NCPU
...do the sysctl thing...
#elif defined(_SC_NPROCESSORS_CONF)
...do the sysconf thing...
#else
#error Uh oh.
#endif

sysctl is the way it should be done. As otherwise noted in this thread already, these particular sysctl entries are generally portable between 4.4BSD based systems.



By the way, in the code below, what are the HW_ keywords (if any) that correspond to the commented out entries in the data[] array? I really only need HW_NCPU and HW_AVAILCPU for now but just out of curiosity?

Not all sysctl entries have numeric items to go along with their names. Sometimes you just have to use sysctlbyname.

And in fact you should use names everywhere you can, rather than OIDs, for forward code compatibility. We are likely to change things in the future, particularly in this area of sysctl, and sysctlbyname will be less fragile. I expect the current values won't change (i.e. suddenly stop working for already compiled code), but the sysctlbyname() is the preferred interface going forward.


-- Terry



_______________________________________________ Do not post admin requests to the list. They will be ignored. Unix-porting mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/unix-porting/email@hidden

This email sent to email@hidden
References: 
 >Discover multiple processors programmatically? (From: "Ivan S. Kourtev" <email@hidden>)
 >Re: Discover multiple processors programmatically? (From: Eric Albert <email@hidden>)
 >Re: Discover multiple processors programmatically? (From: Dave Zarzycki <email@hidden>)
 >Re: Discover multiple processors programmatically? (From: "Ivan S. Kourtev" <email@hidden>)
 >Re: Discover multiple processors programmatically? (From: Eric Albert <email@hidden>)
 >Re: Discover multiple processors programmatically? (From: Terry Lambert <email@hidden>)
 >Re: Discover multiple processors programmatically? (From: "Ivan S. Kourtev" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.