Re: Kevent/Kqueue causing kernel panics
Re: Kevent/Kqueue causing kernel panics
- Subject: Re: Kevent/Kqueue causing kernel panics
- From: Terry Lambert <email@hidden>
- Date: Fri, 5 Nov 2010 20:20:20 -0700
This is not a support channel, and as Shantonu said, you should file a bug. That said:
0x4edaf8 is in unix_syscall64 (/SourceCache/xnu/xnu-1504.7.4/bsd/dev/i386/systemcalls.c:433).
0x4701dc is in kevent (/SourceCache/xnu/xnu-1504.7.4/bsd/kern/kern_event.c:1225).
0x46fdcc is in kevent_internal (/SourceCache/xnu/xnu-1504.7.4/bsd/kern/kern_event.c:1288).
0x29d59a is in usimple_lock (/SourceCache/xnu/xnu-1504.7.4/osfmk/i386/locks_i386.c:352).
0x21b455 is in panic (/SourceCache/xnu/xnu-1504.7.4/osfmk/kern/debug.c:307).
You are getting a simple lock timeout. Simple locks spin for a very short time, and then if they can not be acquired in that time, they cause a panic.
In this case, the simple lock in question is a call to kqlock(kq); at line 1287 of bsd/kern/kern_event.c in kevent_internal() called from kevent().
The lock is failing because it can't be acquired in the requisite time. Most probable causes, IMO, are attempting to run the non-server version of MacOS X under virtualization (virtualization can impact the absolute time locks are held, as well as the accuracy of the clocks by which the interval is measured), and a multithreaded program that is closing the kqueue in question out from under the kevent() system call. There are also other possibilities.
If I had to guess (which I do, since the only way to get sufficient information to fix the issue in this case is to do two machine debugging and/or set up a core dump server), I would have to say that since this is being ported from Linux, the cause is that the queue is getting closed by another thread under the mistaken assumption that since epoll is a cancellation point (i.e. the close will abort the system call), kevent on Mac OS X is as well. It is not.
You should file a bug on this, but if you do, without including either the output of "showallstacks" from a two machine debug session, or a kernel core dump, OR including the sources necessary to reproduce the problem locally (assuming it DOES reproduce locally), it will end up as "Can Not Reproduce". So include the necessary information, or be prepared to be asked for it.
Either way, since it won't result in cancellation of the outstanding event, you should rethink your code to at least maintain a container structure in user space with a retain/release count on the kq to avoid closing the queue out from under it. If you need cancellation for the proper function of your program, you need to familiarize yourself with pthread_self() to save the kevent thread's thread id and pthread_kill() to allow a cancellation method on the container structure as a result of a desire to close the queue.
-- Terry
On Nov 5, 2010, at 6:07 PM, Travis Athougies wrote:
> I have a application that uses the kevent and kqueue API to provide
> asynchronous events to my application. Essentially, my program is
> completely event driven. I have my own event handling system which
> uses a global event queue to distribute events out to a number of
> threads (as many as the number of processors in the system).
> Periodically, kevent and kqueue are called to collect events on a
> number of sockets (I've only tested with one socket, since I can't do
> any more without a kernel panic), and then add these events to the
> global event queue. The event queue system works fine (it can handle
> thousands of simultaneous connections on linux using epoll*
> functions), however on mac, regardless of what I do, if I run my test
> program I get a KERNEL PANIC. Application bugs are one thing, but they
> should NEVER cause a kernel panic. Just wanted to know if anyone else
> has gotten this error and if they have, how they went about resolving
> it. The kernel panic message is below and I can provide source code if
> necessary.
>
> I'm using the Boehm GC with pthreads, if that means anything to you
>
> And this isn't a once in a while thing. This is every time I run the
> application, within a few test client runs.
>
> I've attached a kernel panic. I can provide sources, but keep in mind
> I'm not ready to make this software open-source.
>
> And the panic is always in the same place: locks_i386.c.
>
> --
> Travis Athougies
> <Kernel_2010-11-05-151425_Travis-Athougiess-MacBook-Pro.panic> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Darwin-kernel mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden