Re: P_WEXIT
Re: P_WEXIT
- Subject: Re: P_WEXIT
- From: Terry Lambert <email@hidden>
- Date: Tue, 15 Aug 2006 02:56:47 -0700
On Aug 14, 2006, at 2:57 PM, Joseph Oreste Bruni wrote:
Hello all,
What is the purpose of the P_WEXIT flag? The <sys/proc.h> simply
says "Working on exit."
I have a process that keeps getting stuck in the "E" state (as shown
by the "ps" command) and I can't figure out what it happening here.
My process is being started by launchd and is supposed to be kept
running even if it fails. However, when it gets stuck in the "E"
state, launchd never sees the process terminate and so won't start
another instance. The process is multi-threaded if that has any
relevance.
The P_WEXIT is set when a process has explicitly called exit(), or has
had exit1() called on it, e.g. as a result of taking a fatal signal
(either a SIGKILL, or another signal whose default behaviour is to
terminate the process), or as a result of proc_shutdown(), which is
called on a reboot. It can also be called on a process that has
protected itself from being traced, if you attempt to attach a trace
to it after it has made itself immune from tracing.
The process remains in this state until all active threads have
drained out of the process, and the thread that called the exit() (or
just the last thread, if exit1() was called on behalf of someone else,
or the process was signalled) drains out to the user/kernel boundary,
at which point, if the parent process is not ignoring SIGCHLD, then a
zombie structure is allocated, and the contents filled un, and it
hangs around until the parent process reaps it by calling one of the
wait() functions (e.g. wait4()). If the parent process is ignoring
SIGCHLD, then the process does not create a zombie, and is immediately
reaped.
When you have this situation happening to one of your programs,
usually it's because you have an uninterruptible thread in the process
which is unable to drain out (maybe it's blocked on a network
resource, or maybe it's in a blocking call that can't be interrupted,
or maybe it's stuck in a device driver for a device you've powered
down or unplugged and the driver didn't notice and drain all pending
requests out automatically because it has a bug or isn't well behaved,
etc.).
For multithreaded programs, it's generally best to have a clean
shutdown routine for each of the threads, and shut the process down in
an orderly fashion, rather than simply calling exit() (or taking a
SIGKILL or other fatal signal) and terminating things abnormally. The
normal way to deal with this is a pthread_kill(), with a pthread_exit
() in the exit handler, and a pthread_join() in the main thread that
was just calling exit().
If there is a thread stuck in a driver or other uninterruptible
context, there's not really a lot you can do, other than "don't do
that"/"download a newer driver that doesn't have the bug"/etc..
You may want to ktrace the process and watch it in this situation
(assuming it hasn't disabled tracing on itself; you can always
recompile without that line and trace it anyway, to find out what's
going on). This will give you some idea of where it's stuck. You can
also use various "ps" options to get more info (for example, a "-M"
will dump out thread information for individual threads, so if the
options you choose include the "STAT" column, if you see a "U" in that
column, it means you are in an uninterruptible wait on that thread).
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
References: | |
| >P_WEXIT (From: Joseph Oreste Bruni <email@hidden>) |