P_WEXIT

14 Dec 2006

      site_archiver@lists.apple.com
Delivered-To: darwin-kernel@lists.apple.com

In a previous email, written 15 August 2006 by Terry Lambert:

Frustrated,
Joseph
PS: Best Wishes for the Holidays everyone!
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a...

This email sent to site_archiver@lists.apple.com
The P_WEXIT is set when a process has explicitly called exit(), or
has had exit1() called on it, e.g. as a result of taking a fatal
signal (either a SIGKILL, or another signal whose default behaviour
is to terminate the process), or as a result of proc_shutdown(),
which is called on a reboot. It can also be called on a process
that has protected itself from being traced, if you attempt to
attach a trace to it after it has made itself immune from tracing.
The process remains in this state until all active threads have
drained out of the process, and the thread that called the exit()
(or just the last thread, if exit1() was called on behalf of
someone else, or the process was signalled) drains out to the user/
kernel boundary, at which point, if the parent process is not
ignoring SIGCHLD, then a zombie structure is allocated, and the
contents filled un, and it hangs around until the parent process
reaps it by calling one of the wait() functions (e.g. wait4()). If
the parent process is ignoring SIGCHLD, then the process does not
create a zombie, and is immediately reaped.

I've been reading and re-reading this to try to understand why my
process is getting stuck. As I might have pointed out in my previous
email, by the time my main thread calls exit() (implicitly by
returning from main()), there are NO other threads besides the main
thread since the main thread has joined with all other threads.
Is it possible that the kernel thinks there are still some threads
remaining?
Could it be related to the fact that my threads are exiting via
pthread_testcancel() rather than by pthread_exit()?
I know that thread cancelation on Darwin is a bit broken by default
without specifying POSIX code via the DARWIN_ALIAS macro. Could I
have stumbled onto some part of the kernel that just hasn't received
a lot of testing?
On my test system, I can start and kill my process all day long and
never reproduce this. In production, where the system runs for weeks,
the condition occurs...
smime.p7s

Joseph Oreste Bruni

tags

participants (1)