Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: P_WEXIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: P_WEXIT

Subject: Re: P_WEXIT
From: email@hidden
Date: Tue, 15 Aug 2006 22:17:09 -0700


On Aug 15, 2006, at 1:44 PM, Terry Lambert wrote:

On Aug 15, 2006, at 7:38 AM, AgentM wrote:
On Aug 15, 2006, at 10:20 , Joseph Oreste Bruni wrote:
Thank you very much for the explanation. I thought that I had carefully "joined" all threads, but there might be one that I've missed. I'll go over everything again.

The way I'm backing out of blocking calls that are not cancelation points is to have another thread close() the socket out from under the blocked thread. This results in an EINVALID error which I use to indicate that the listener thread needs to terminate.
Is it possible you are opening additional file descriptors between the time you close the socket and the time the thread decides to clean up? In that case, because file descriptors can be recycled, you would not only have a race condition but threads that are accessing foreign resources.
This is very possible; the way listen sockets work is that incoming packets effectively get to the point of having a socket not specifically associated with an fd in the input queue for the process that made the listen cal, using the credential off the listen socket as the credential for the new socket. The actual accept() call is more or less a formality, as the connection has effectively been completed, and serves only to (1) associate an fd in the process open file table, and to (2) deal with the accounting for the listen queue depth.

It's not clear from the small amount of information so far whether what Joseph has called "the listen thread" here is blocked in a blocking accept() call, a select() call, a poll() call, waiting for a kevent(), or so on. Any of these methods could be used to try to block the thread until a connection was ready to be completed (and the fd created by a subsequent call to accept()).

If it's actually a blocking call, but it's interruptible, then he's going to want an explicit pthread_kill() to wake the thread, rather than depending on the close of the socket underneath whatever blocking call is being used resulting in a synchronous notification via the EINVALID. Depending on where it's blocked, there may not *be* a notification until an explicit wakeup happens, or the blocking call times out or is explicitly interrupted by an AST (signals in Mac OS X are implemented by ASTs, which requires that they run up to the user/kernel boundary before they are delivered; OSF/1 and True64 UNIX have signals that act the same way - operations that you expect to interrupt can run to completion before you get actual notification of the signal).

A pthread_kill(), if the block is interruptible, will guarantee an AST delivery to the specific thread, rather than just whatever thread happens to run next and not have the signal masked.

Also, in general, it's more trouble than it's worth to re-join threads.
This really depends. By default, if you ignore it, you're going to leak a mach_port, a stack, a thread structure, any thread local storage, and you're going to leave it on the list of active threads. So in general, unless you explicitly set up an attribute and call pthread_attr_setdetachstate() before creating the thread, or explicitly call pthread_detach() after creating it, you're going to leak until you join the thread.

For applications that tend to "fix" their memory leaks by exiting (older versions of some software, such as sendmail, are infamous for this trick), this is probably an OK thing, but for applications that run for a long time, just letting it leak could be a pretty bad idea. At the very least, calling pthread_detach() on yourself to avoid the issue is minimally polite to whoever follows and either uses your code as an example, or changes it to have much different assumptions than your own. In this case, he's likely to want a heap variable with the thread id and to know whether or not the pthread_kill() took anyway, and it's not that much more trouble for the information he's going to be able to get back from doing it.

-- Terry

Before creating any threads, I block some signals with pthread_sigmask () so that those signals are blocked in any subsequently created threads. The main thread also creates any needed sockets. After all threads have been created, the main thread calls sigwait() with the same set of signals (TERM, INT, HUP), so that only the main thread actually deals with signals. There are no signal handlers installed -- I'm only using pthread_sigmask() and sigwait() to deal with signals. The only signal I explicitly ignore with a sigaction() is PIPE.

When it receives either a TERM or an INT, the main thread breaks from it's loop and proceeds to cancel all threads and close any listener sockets (one AF_INET and two AF_UNIX sockets). The main thread performs a join on every thread.

I have a single "listen" thread that blocks on accept() waiting for a connection. Since accept() is not a cancelation point, I have accept wrapped in a loop that includes an explicit pthread_testcancel(). There is no way to exit from this loop other than at pthread_testcancel().

The main thread that is intending to shut everything down first calls pthread_cancel() on the listener thread's ID to queue up a cancelation, and then closes the listener socket. The listener thread will receive the EINVALID from accept(), continues on its loop until it hits the pthread_testcancel() at which point it is joined by the main thread.

[I thought about using select() on the listener socket but that isn't a cancelation point either. I'd just be in a spin-loop between select () and pthread_testcancel().]

After looking through my code, I am joining with every thread that has ever been created. My program then seems to get stuck after returning from main(). I did spend a bit of time on this shutdown code to make sure I got everything right from anywhere from 3 to 100 threads. I've never leaked a thread that I could tell. I keep all my thread_id's in a vector and join on each one of them. If any of them didn't come back, my main thread would get stuck on the call to pthread_join, but that doesn't happen.

I still don't know what is causing me to get stuck in the P_WEXIT state after I've joined all threads and returned from main(). This process typically runs for weeks at a time handling around 2000 simultaneous SSL connections until you send it a TERM signal. Most of the time it shuts down clean and does not leave any E processes. Once in a while I get an E.

But, back to a point you made: rather than closing the socket in the main thread, you suggested using pthread_kill to wake the listener thread. Would this result in an EINTR from accept()? Is there a particular signal that I should use? Is the kernel creating any threads on my behalf other than the ones from pthread_create() that might be getting stuck?

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: P_WEXIT
From: Quinn <email@hidden>


References:  
  >P_WEXIT (From: Joseph Oreste Bruni <email@hidden>)
  >Re: P_WEXIT (From: Terry Lambert <email@hidden>)
  >Re: P_WEXIT (From: Joseph Oreste Bruni <email@hidden>)
  >Re: P_WEXIT (From: AgentM <email@hidden>)
  >Re: P_WEXIT (From: Terry Lambert <email@hidden>)




Prev by Date:
Re: mdsl source & another metadata question

Next by Date:
Re: P_WEXIT

Previous by thread:
Re: P_WEXIT

Next by thread:
Re: P_WEXIT

Index(es):

Date
Thread