Re: Process Signal Bug On Intel Dual Core Machines?
Re: Process Signal Bug On Intel Dual Core Machines?
- Subject: Re: Process Signal Bug On Intel Dual Core Machines?
- From: Markus Hanauska <email@hidden>
- Date: Wed, 30 Aug 2006 10:48:13 +0200
On 06-08-30, at 03:45, Terry Lambert wrote:
Whether or not you are permitted to send a signal to a given
process depends on your credentials vs. the target processes
credentials, and whether or not the process has already zombied at
the time you attempt to send the signal.
No, the process has not zombied for sure, since it can still process
network packages.
Regarding permissions, the daemon runs as root, that is true, but the
control app belongs to root:wheel and has the SUID bit set, that
means it also runs as root, always (you can't let it run as another
user).
Here is something else I have found out:
I found out that HUP signal (SIGHUP) is the most reliable one. It
*always* is delivered to the process. Since re-reading the config is
not supported anyway at the moment, I changed my daemon to use this
signal to shutdown and now it works *every time*. SIGHUP is always
delivered, the signal handler is always called, the app closes down
as expected.
I found out that SIGQUIT (which the app is not handling in the signal
handler) also works occasionally, but very seldom and it kills the
process as expected. SIGTERM still never works, SIGINT works, but not
reliably (fine from command line with kill and killall, only on every
second call from within my controller application).
Considering that the app runs as expected using the SIGHUP signal on
Intel and PPC (and this is treated exactly the SIGTERM used to be
treated), however, SIGTERM only works on PPC, I rather wonder how
this could be the fault of my application code.
Effectively, you should *always* look at your result code; if you
get a -1, and errno is ESRCH, it means the process does not exist
(as far as the system can tell); if you get an EPERM, then this
means that the process exists, but you have insufficient privilege
to send the signal to it. You can also get an EINVALif the signal
is out of range, but that does not look like the case.
The result code of the ineffective kill/killall calls from command
line (as root!) is always 0, which means signal successfully
delivered, still nothing happens. Same in my app. I never get any
error back.
See my code example:
if (result == 0) {
if ((result = kill(pid, SIGINT)) != 0) {
result is always 0, still no signal arrives at the destination
process. I verified pid to be correct.
You can also effectively mask signals on a per thread basis, as
well as just a process basis, except for unmaskable signals - so
you might want to try SIGKILL from your process, rather than
SIGINT, to see if it's not delivered, or if it was just masked
(perhaps by a library routine you didn't know was masking it).
But doesn't GDB ignore the masking? Even if I mask the signal,
wouldn't GDB show the arriving signal anyway and just the process
ignore it? And why would different signals be masked on Intel than on
PPC machines?
You should also be aware that a signal sent to a process is
delivered via a Mach AST, and that gdb traps these ASTs, and can
block, redirect, or otherwise cause them to not be acted upon.
Yes, I know that. The problem is that GDB never sees any signals
arriving. If they'd arrive in GDB and just not at the process, I
wouldn't complain, but they are never shown in GDB to begin with. And
even without GDB running they don't arrive at the process.
Also, since ASTs only fire on the way out of a system call (or
cancellation point, if the call can be precancelled), the signal
will not be deliverred until one or more of your threads run up
through the trap handler into the trampoline code in user space
(and then call back into the kernel to return to user space not on
the signal context). So it's possible for you to, for example,
start a read, send a signal to the process, and not see the read
interrupted (i.e. it could complete prior to the signal handler
firing). This is the same things DEC True64 UNIX, and a number of
other OS's with signals built on AST-style implementations will do:
you get the signal, but the operation completes before the signal
delivery actually happens.
This is all a very nice theory, but why is SIGHUP then always
delivered (also from my controller app), always interrupts the
running select, but SIGTERM is never, SIGQUIT only on rare occasions
and SIGINT on every second call from my controller app? This sounds
all extremely undeterministic and undeterministic behavior of the
same piece of code sounds a lot like a system bug to me.
One common problem with debugging signals is that you don't want
the signals sent to a subshell used to invoke your program; the
default in gdb on Mac OS X is "start-with-shell" set to "on". If
you plan on debugging signals, you will want to turn this off, e.g.:
(gdb) set start-with-shell off
before attempting to debug anything to do with signals.
I'll try that. I doubt that I will be able to explain the SIGHUP/
SIGTERM riddle that way.
I'll keep you updated. I have also filed a bug report for this issue.
--
Best Regards,
Markus Hanauska
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden