To anyone interested, I've solved the issue, It has been very complicated to troubleshoot this, because there's a real lack of documentation on how kqueue should be used in a multi-threaded environment. After struggling and studying the kernel, I've fixed my issue ending with a more concrete knowledge of the topic. What I describe here, applies to a Mac OS X 10.6.7 kernel, I really don't know if this has the same behaviour on other BSD variants and/or other OS X versions....
My problem was that, under certain circumnstances, I was registering events on the same file descriptor, from multiple threads simultaneous, this ended with a race condition that, for a funny reason, is shown only when you close the file descriptor, there's no lock actually during the kevent() call.
So, kqueue listens can be invoked on the same kqueue FD from multiple threads, if you use the ONESHOT flag for the events, you'll get just one message per thread, while if you don't use ONESHOT, you may get the same message on each listening thread. Kevents can be registerd from multiple threads as long as you protect the calls on the same file descriptor, i.e. if you want to register for a WRITE event from a sock, be sure to wrap the kevent call in a mutex or whatever else . Same thing, if you need to react to a EOF condition and/or an error that tells you the socket needs to be closed, lock any kevent() call while you close the socket to again avoid a race condition... Also unless you use the ONESHOT flag, avoid to re-register already registered events because it requires you to provide further syncronizations and it's really useless, I did that because I ported the logic from an epool based linux code, that works in a total different way..
Thanks for the support, hope this helps someone else!
Cheers !
Leonardo Bernardini
From: email@hidden To: email@hidden Subject: RE: kqueue issue in multithreaded environment Date: Thu, 9 Jun 2011 11:02:15 +0000
I was finally able to produce a stack trace with kernel symbols to solve this issue. To make a little resume, I get locks on certain threads that tries to close FDs after receiving an EV_EOF from kqueue. On a random basis, my threads stuck on the close() system call. After inspecting the stack trace that I'm copying here, it seams that my 2 locked threads (the ones at the bottom of the stack trace, the other threads are just waiting the termination of the others to shutdown they have no relevance...) stucks on the kevent_register event that "should" remove the file descriptors from the kevent queue fd. Does anybody here have some advice to further troubleshoot this thing ? I'm not allowed to trigger kqueue events in a multi-thread environemnt even while another thread is handling/listening on the same kqueue fd ? thanks a lot ! System stack snapshot (1 iteration(s)) initiate
From: email@hidden To: email@hidden Date: Wed, 1 Jun 2011 11:37:02 +0000 Subject: Re: kqueue issue in multithreaded environment
Hi Axel, First of all thank you for your hints, I've spent several hours trying to address this, actually I was able to produce a dump using stackshot, unfortunately I'm not able to resolve the symbols of the running kernel, I've tried several methods, after downloading the 10.6.7 symbols, with no success This is the dump of one thread stuck into the close() system call, does anybody here is able to resolve the kernel stack to a function call that make sense ? Thanks a lot !
Kernel stack:
0xffffff800022c888 (0xffffff800022c888)
0xffffff800020990e (0xffffff800020990e)
0xffffff8000209acb (0xffffff8000209acb)
0xffffff8000206270 (0xffffff8000206270)
0xffffff800024e416 (0xffffff800024e416)
0xffffff800024e865 (0xffffff800024e865)
0xffffff800047238c (0xffffff800047238c)
0xffffff80004741ff (0xffffff80004741ff)
0xffffff80004742f6 (0xffffff80004742f6)
0xffffff80004e8158 (0xffffff80004e8158)
0xffffff80002e4874 (0xffffff80002e4874)
User stack:
close (in libSystem.B.dylib) + 10 (0x7fff875ce99a)
MFramework::MSocket::CMSocketStream::Close() (in libVVFramework.dylib) (VVSocket.cpp:459) (0x1004c18a5)
MDispatcher::IMDispClientsLogic::SocketEventClose(MFramework::MSocket::CMSocketStream*, bool) (in libVVDispatcherCore.dylib) (IDisp_ClientLogic.cpp:3296) (0x1000e58bc)
MDispatcher::IMDispClientsLogic::clientsHandlingThread(void*) (in libVVDispatcherCore.dylib) (IDisp_ClientLogic.cpp:3634) (0x1000fb777)
_pthread_start (in libSystem.B.dylib) + 331 (0x7fff875fd4f6)
thread_start (in libSystem.B.dylib) + 13 (0x7fff875fd3a9) >> Hello Leonardo, Since you didn't provide many details, perhaps could this thread prove somewhat enlightening: http://lists.apple.com/archives/darwin-kernel/2010/Aug/msg00022.html (be sure to read the whole thread, as its subject has changed in the middle) HTH, Axel >> Hi all, I would like to thanks anyone who can help me with this matter because I'm stuck since several days finding a solution.I've converted a multi
|