Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Process Signal Bug On Intel Dual Core Machines?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Process Signal Bug On Intel Dual Core Machines?

Subject: Re: Process Signal Bug On Intel Dual Core Machines?
From: Markus Hanauska <email@hidden>
Date: Wed, 30 Aug 2006 10:48:13 +0200


On 06-08-30, at 03:45, Terry Lambert wrote:

Whether or not you are permitted to send a signal to a given process depends on your credentials vs. the target processes credentials, and whether or not the process has already zombied at the time you attempt to send the signal.

No, the process has not zombied for sure, since it can still process network packages.

Regarding permissions, the daemon runs as root, that is true, but the control app belongs to root:wheel and has the SUID bit set, that means it also runs as root, always (you can't let it run as another user).

Here is something else I have found out:

I found out that HUP signal (SIGHUP) is the most reliable one. It *always* is delivered to the process. Since re-reading the config is not supported anyway at the moment, I changed my daemon to use this signal to shutdown and now it works *every time*. SIGHUP is always delivered, the signal handler is always called, the app closes down as expected.

I found out that SIGQUIT (which the app is not handling in the signal handler) also works occasionally, but very seldom and it kills the process as expected. SIGTERM still never works, SIGINT works, but not reliably (fine from command line with kill and killall, only on every second call from within my controller application).

Considering that the app runs as expected using the SIGHUP signal on Intel and PPC (and this is treated exactly the SIGTERM used to be treated), however, SIGTERM only works on PPC, I rather wonder how this could be the fault of my application code.

Effectively, you should *always* look at your result code; if you get a -1, and errno is ESRCH, it means the process does not exist (as far as the system can tell); if you get an EPERM, then this means that the process exists, but you have insufficient privilege to send the signal to it. You can also get an EINVALif the signal is out of range, but that does not look like the case.

The result code of the ineffective kill/killall calls from command line (as root!) is always 0, which means signal successfully delivered, still nothing happens. Same in my app. I never get any error back.

See my code example:

if (result == 0) {
    if ((result = kill(pid, SIGINT)) != 0) {

result is always 0, still no signal arrives at the destination process. I verified pid to be correct.

You can also effectively mask signals on a per thread basis, as well as just a process basis, except for unmaskable signals - so you might want to try SIGKILL from your process, rather than SIGINT, to see if it's not delivered, or if it was just masked (perhaps by a library routine you didn't know was masking it).

But doesn't GDB ignore the masking? Even if I mask the signal, wouldn't GDB show the arriving signal anyway and just the process ignore it? And why would different signals be masked on Intel than on PPC machines?

You should also be aware that a signal sent to a process is delivered via a Mach AST, and that gdb traps these ASTs, and can block, redirect, or otherwise cause them to not be acted upon.

Yes, I know that. The problem is that GDB never sees any signals arriving. If they'd arrive in GDB and just not at the process, I wouldn't complain, but they are never shown in GDB to begin with. And even without GDB running they don't arrive at the process.

Also, since ASTs only fire on the way out of a system call (or cancellation point, if the call can be precancelled), the signal will not be deliverred until one or more of your threads run up through the trap handler into the trampoline code in user space (and then call back into the kernel to return to user space not on the signal context). So it's possible for you to, for example, start a read, send a signal to the process, and not see the read interrupted (i.e. it could complete prior to the signal handler firing). This is the same things DEC True64 UNIX, and a number of other OS's with signals built on AST-style implementations will do: you get the signal, but the operation completes before the signal delivery actually happens.

This is all a very nice theory, but why is SIGHUP then always delivered (also from my controller app), always interrupts the running select, but SIGTERM is never, SIGQUIT only on rare occasions and SIGINT on every second call from my controller app? This sounds all extremely undeterministic and undeterministic behavior of the same piece of code sounds a lot like a system bug to me.

One common problem with debugging signals is that you don't want the signals sent to a subshell used to invoke your program; the default in gdb on Mac OS X is "start-with-shell" set to "on". If you plan on debugging signals, you will want to turn this off, e.g.:
	(gdb) set start-with-shell off
before attempting to debug anything to do with signals.

I'll try that. I doubt that I will be able to explain the SIGHUP/ SIGTERM riddle that way. I'll keep you updated. I have also filed a bug report for this issue.

--
Best Regards,
    Markus Hanauska


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: Process Signal Bug On Intel Dual Core Machines?
From: Terry Lambert <email@hidden>


References:  
  >Process Signal Bug On Intel Dual Core Machines? (From: Markus Hanauska <email@hidden>)
  >Re: Process Signal Bug On Intel Dual Core Machines? (From: Markus Hanauska <email@hidden>)
  >Re: Process Signal Bug On Intel Dual Core Machines? (From: Terry Lambert <email@hidden>)




Prev by Date:
Re: Determining kext running path

Next by Date:
RE: Determining kext running path

Previous by thread:
Re: Process Signal Bug On Intel Dual Core Machines?

Next by thread:
Re: Process Signal Bug On Intel Dual Core Machines?

Index(es):

Date
Thread