Re: Process Signal Bug On Intel Dual Core Machines?
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com (gdb) set start-with-shell off before attempting to debug anything to do with signals. Hope that helps. -- Terry if (result == 0) { if ((result = kill(pid, SIGINT)) != 0) { On 06-08-28, at 16:46, Markus Hanauska wrote: Hello! Now here's my GDB session: I'll now take a loot at the signal table in GDB: (gdb) info signals Signal Stop Print Pass to program Description SIGHUP Yes Yes Yes Hangup SIGINT Yes Yes No Interrupt SIGQUIT Yes Yes Yes Quit SIGILL Yes Yes Yes Illegal instruction SIGTRAP Yes Yes No Trace/breakpoint trap SIGABRT Yes Yes Yes Aborted SIGEMT Yes Yes Yes Emulation trap SIGFPE Yes Yes Yes Arithmetic exception SIGKILL Yes Yes Yes Killed SIGBUS Yes Yes Yes Bus error SIGSEGV Yes Yes Yes Segmentation fault SIGSYS Yes Yes Yes Bad system call SIGPIPE Yes Yes Yes Broken pipe SIGALRM No No Yes Alarm clock SIGTERM Yes Yes Yes Terminated : Now I continue running the app (gdb) cont Continuing. In another shell I do the following: ~ root# kill -TERM 927 ~ root# kill -QUIT 927 Again, nothing! Okay, but now, let's try SIGINT: Program received signal SIGINT, Interrupt. 0x900f9294 in __select () (gdb) -- Best Regards, Markus Hanauska _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/tlambert%40apple.com _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... Whether or not you are permitted to send a signal to a given process depends on your credentials vs. the target processes credentials, and whether or not the process has already zombied at the time you attempt to send the signal. So, for example, if your daemon program is started by launchd as root, and you are a non-root user (no, it does not matter if you are an administrator or not), unless your credentials match, or you are the process group leader and in the same process group, etc., you will not be able to send a signal. Effectively, you should *always* look at your result code; if you get a -1, and errno is ESRCH, it means the process does not exist (as far as the system can tell); if you get an EPERM, then this means that the process exists, but you have insufficient privilege to send the signal to it. You can also get an EINVALif the signal is out of range, but that does not look like the case. You can also effectively mask signals on a per thread basis, as well as just a process basis, except for unmaskable signals - so you might want to try SIGKILL from your process, rather than SIGINT, to see if it's not delivered, or if it was just masked (perhaps by a library routine you didn't know was masking it). If SIGKILL will always kill it, then it's pretty likely that someone is masking the other (maskable) signals. You should also be aware that a signal sent to a process is delivered via a Mach AST, and that gdb traps these ASTs, and can block, redirect, or otherwise cause them to not be acted upon. Also, since ASTs only fire on the way out of a system call (or cancellation point, if the call can be precancelled), the signal will not be deliverred until one or more of your threads run up through the trap handler into the trampoline code in user space (and then call back into the kernel to return to user space not on the signal context). So it's possible for you to, for example, start a read, send a signal to the process, and not see the read interrupted (i.e. it could complete prior to the signal handler firing). This is the same things DEC True64 UNIX, and a number of other OS's with signals built on AST-style implementations will do: you get the signal, but the operation completes before the signal delivery actually happens. One common problem with debugging signals is that you don't want the signals sent to a subshell used to invoke your program; the default in gdb on Mac OS X is "start-with-shell" set to "on". If you plan on debugging signals, you will want to turn this off, e.g.: On Aug 28, 2006, at 8:46 AM, Markus Hanauska wrote: Actually, after doing some more testing, the issue seems worse than I thought. While KILL or KILLALL from shell always work with the SIGINT signal, doing the same from within my controller app only works *RANDOMLY*. Very simple code: Sometimes this kills the process and sometimes not. If it does not work the first time (the process is never delivered), it works when I call it a second time. Same code, once the signal is delivered, once not. I have a daemon process, that has a signal handler which listens for SIGHUP, SIGINT, SIGTERM, SIGUSR1, SIGUSR2, SIGPIPE. The problem is, this process never reacts to SIGTERM, which it does handle. Funny thing is, it does not react to SIGQUIT, although it does not block it (the default action should take place). It does however react to SIGINT. Don't get me wrong: The problem is, that the signals are really *not* delivered to the process! I can prove it with GDB. E.g. I start the process, it gets ID 972. ~ root# gdb GNU gdb 6.3.50-20050815 (Apple version gdb-609) (Fri Jul 28 05:21:24 UTC 2006) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-apple-darwin". (gdb) attach 927 Attaching to process 927. Reading symbols for shared libraries . done Reading symbols for shared libraries .............. done 0x900f9294 in __select () (gdb) As you can see, it shall stop at every signal and print every received signal except SIGALRM. Further it should pass all signals to the app except SIGINT and SIGTRAP. What happens in GDB? Nothing. Is the signal handler called? No. I have verified that by setting a break point at the signal handler in a previous test. Even if I have no signal handler for TERM, even if I block that signal or ignore that signal, GDB should still *stop* and *print* it. It does not. Why not? Because _no signal_ is delivered. Why not? How can this be? Ok, let's try QUIT. QUIT is not even handled by my signal handler, it should do the default action. Here we go: Huh? How can this be? How can it be that SIGINT is delivered, but SIGTERM and SIGQUIT are not?!? Wouldn't GDB show the signal, regardless if my app ignores or blocks it (what it does not, nowhere in the code I see anything like this taking place). Now you may say, how's that a possible kernel bug? Very simple: I can't reproduce that on any PPC machine. I also can't reproduce that on my Mac Mini Core Solo, but I can reproduce this to 100% on an iMac Intel with Dual Core. This bug is driving me really nuts and lets me doubt my sanity. And why only on Dual Core Intel? (not in Rosetta, the daemon is universal) Can it be that this is some kernel layer bug in the signal delivery? The work-a-round for me is to use SIGINT on all machines which is working fine as far as I can see. But this daemon exists since 10.2 and it has always been working, 10.2 to 10.4, on any machine, always using SIGTERM and now, all of a sudden it fails on iMac and many Mac Books and Mac Book Pros with Intel Dual Core - not always to 100% reproducible; for some it's sometimes working and sometimes not - which makes me believe even more that this is a really, really nasty kernel bug. I can provide you with every debug output from GDB, Shark or any other tool you like. I just can't post any source here. Any help is appreciated. This email sent to tlambert@apple.com This email sent to site_archiver@lists.apple.com
participants (1)
-
Terry Lambert