Re: P_WEXIT
Re: P_WEXIT
- Subject: Re: P_WEXIT
- From: email@hidden
- Date: Tue, 15 Aug 2006 22:17:09 -0700
On Aug 15, 2006, at 1:44 PM, Terry Lambert wrote:
On Aug 15, 2006, at 7:38 AM, AgentM wrote:
On Aug 15, 2006, at 10:20 , Joseph Oreste Bruni wrote:
Thank you very much for the explanation. I thought that I had
carefully "joined" all threads, but there might be one that I've
missed. I'll go over everything again.
The way I'm backing out of blocking calls that are not
cancelation points is to have another thread close() the socket
out from under the blocked thread. This results in an EINVALID
error which I use to indicate that the listener thread needs to
terminate.
Is it possible you are opening additional file descriptors between
the time you close the socket and the time the thread decides to
clean up? In that case, because file descriptors can be recycled,
you would not only have a race condition but threads that are
accessing foreign resources.
This is very possible; the way listen sockets work is that incoming
packets effectively get to the point of having a socket not
specifically associated with an fd in the input queue for the
process that made the listen cal, using the credential off the
listen socket as the credential for the new socket. The actual
accept() call is more or less a formality, as the connection has
effectively been completed, and serves only to (1) associate an fd
in the process open file table, and to (2) deal with the accounting
for the listen queue depth.
It's not clear from the small amount of information so far whether
what Joseph has called "the listen thread" here is blocked in a
blocking accept() call, a select() call, a poll() call, waiting for
a kevent(), or so on. Any of these methods could be used to try to
block the thread until a connection was ready to be completed (and
the fd created by a subsequent call to accept()).
If it's actually a blocking call, but it's interruptible, then he's
going to want an explicit pthread_kill() to wake the thread, rather
than depending on the close of the socket underneath whatever
blocking call is being used resulting in a synchronous notification
via the EINVALID. Depending on where it's blocked, there may not
*be* a notification until an explicit wakeup happens, or the
blocking call times out or is explicitly interrupted by an AST
(signals in Mac OS X are implemented by ASTs, which requires that
they run up to the user/kernel boundary before they are delivered;
OSF/1 and True64 UNIX have signals that act the same way -
operations that you expect to interrupt can run to completion
before you get actual notification of the signal).
A pthread_kill(), if the block is interruptible, will guarantee an
AST delivery to the specific thread, rather than just whatever
thread happens to run next and not have the signal masked.
Also, in general, it's more trouble than it's worth to re-join
threads.
This really depends. By default, if you ignore it, you're going to
leak a mach_port, a stack, a thread structure, any thread local
storage, and you're going to leave it on the list of active
threads. So in general, unless you explicitly set up an attribute
and call pthread_attr_setdetachstate() before creating the thread,
or explicitly call pthread_detach() after creating it, you're going
to leak until you join the thread.
For applications that tend to "fix" their memory leaks by exiting
(older versions of some software, such as sendmail, are infamous
for this trick), this is probably an OK thing, but for applications
that run for a long time, just letting it leak could be a pretty
bad idea. At the very least, calling pthread_detach() on yourself
to avoid the issue is minimally polite to whoever follows and
either uses your code as an example, or changes it to have much
different assumptions than your own.
In this case, he's likely to want a heap variable with the thread
id and to know whether or not the pthread_kill() took anyway, and
it's not that much more trouble for the information he's going to
be able to get back from doing it.
-- Terry
Before creating any threads, I block some signals with pthread_sigmask
() so that those signals are blocked in any subsequently created
threads. The main thread also creates any needed sockets. After all
threads have been created, the main thread calls sigwait() with the
same set of signals (TERM, INT, HUP), so that only the main thread
actually deals with signals. There are no signal handlers installed
-- I'm only using pthread_sigmask() and sigwait() to deal with
signals. The only signal I explicitly ignore with a sigaction() is PIPE.
When it receives either a TERM or an INT, the main thread breaks from
it's loop and proceeds to cancel all threads and close any listener
sockets (one AF_INET and two AF_UNIX sockets). The main thread
performs a join on every thread.
I have a single "listen" thread that blocks on accept() waiting for a
connection. Since accept() is not a cancelation point, I have accept
wrapped in a loop that includes an explicit pthread_testcancel().
There is no way to exit from this loop other than at
pthread_testcancel().
The main thread that is intending to shut everything down first calls
pthread_cancel() on the listener thread's ID to queue up a
cancelation, and then closes the listener socket. The listener thread
will receive the EINVALID from accept(), continues on its loop until
it hits the pthread_testcancel() at which point it is joined by the
main thread.
[I thought about using select() on the listener socket but that isn't
a cancelation point either. I'd just be in a spin-loop between select
() and pthread_testcancel().]
After looking through my code, I am joining with every thread that
has ever been created. My program then seems to get stuck after
returning from main(). I did spend a bit of time on this shutdown
code to make sure I got everything right from anywhere from 3 to 100
threads. I've never leaked a thread that I could tell. I keep all my
thread_id's in a vector and join on each one of them. If any of them
didn't come back, my main thread would get stuck on the call to
pthread_join, but that doesn't happen.
I still don't know what is causing me to get stuck in the P_WEXIT
state after I've joined all threads and returned from main(). This
process typically runs for weeks at a time handling around 2000
simultaneous SSL connections until you send it a TERM signal. Most of
the time it shuts down clean and does not leave any E processes. Once
in a while I get an E.
But, back to a point you made: rather than closing the socket in the
main thread, you suggested using pthread_kill to wake the listener
thread. Would this result in an EINTR from accept()? Is there a
particular signal that I should use? Is the kernel creating any
threads on my behalf other than the ones from pthread_create() that
might be getting stuck?
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
References: | |
| >P_WEXIT (From: Joseph Oreste Bruni <email@hidden>) |
| >Re: P_WEXIT (From: Terry Lambert <email@hidden>) |
| >Re: P_WEXIT (From: Joseph Oreste Bruni <email@hidden>) |
| >Re: P_WEXIT (From: AgentM <email@hidden>) |
| >Re: P_WEXIT (From: Terry Lambert <email@hidden>) |