Re: close() hangs when closing a kqueue in 10.5 only
Re: close() hangs when closing a kqueue in 10.5 only
- Subject: Re: close() hangs when closing a kqueue in 10.5 only
- From: Terry Lambert <email@hidden>
- Date: Mon, 16 Aug 2010 17:38:36 -0700
On Aug 16, 2010, at 12:56 PM, Justin C. Walker wrote:
> On Aug 16, 2010, at 11:41 , Jerry Krinock wrote:
>> A Cocoa app I've written uses a kqueue to watch its document file. It works fine in Mac OS 10.6, but when run in Mac OS 10.5.8, when I close the kqueue like this:
>>
>> close (kqueueFileDescriptor) ;
>>
>> The function close() hangs forever. The argument kqueueFileDescriptor is the integer I got previously from:
>>
>> NSInteger kqueueFileDescriptor = kqueue() ;
>>
>> Does anyone know what might be going on here? Documentation says that read() can block until data is ready, but there is no mention of this in the documentation for close().
>>
>> I suppose I could just leave the kqueue open if running in 10.5, because a typical application run only creates one of them.
>>
>> Is close() open-source? I can't find it in macosforge.org or opensource.apple.com.
>
> As mentioned, this system call is in the open-source part of Mac OS X. The tricky part is that very low-level calls like this may not have an implementation as you might expect to see it. I believe that close() is implemented by table-manipulation tom-foolery, during the build of the kernel.
>
> The close call in libsystem probably invokes a generic "system call" trap directly, which gets "vectored" to a dispatch table in the kernel. I think the actual code is in
> bsd/kern/kern_descrip.c
> called closef_locked(). The logic here is kind of twisty because the system has to account for a variety of different file descriptor types and behaviors, and in particular, detect "last close" issues.
The implementation is in the struct fileops in the xnu file bsd/kern/kern_event.c.
<http://fxr.watson.org/fxr/source/bsd/kern/kern_event.c?v=xnu-1228>
Specifically, it's going to be blocking in kqueue_dealloc(), probably because you either have a lock on the proc elsewhere (see the stackshot output to find out who you're blocked against), or you're adding events faster than they can be discarded, or you have another thread camping out on the fd in kevent(). If it's events being added faster than they can be removed as part of the close, you want to disable the events you have outstanding before doing the close if that's the cause. If it's a thread, you should pthread_kill() it to interrupt the kevent call (meaning you need to have it's thread ID saved off when it starts or as a result of calling pthread_self() and stashing the result in a volatile global. Otherwise, it's someone in the way of the lock, and you need to find out who has the lock and why.
You can also add the kmem boot argument if it's not old enough to have a /dev/kmem, and get read-only gdb access to the currently running kernel, without having to set up two machine debugging. You're probably going to need this to dump the lock structure contents for whoever is currently holding the lock in your way. The lock will have a thread id in the structure which will tell you who's holding it.
Typically, if your application is mutithreaded, you could be doing something on another thread which holds the lock. You could also be racing two attempts to close the same fd, which could (possible) deadlock with a deadly embrace as it's trying to reacquire the proc_fdlock(). You might also have a long-standing sysctl going on. This is typically something people who want to add system calls via KEXT use, but you actually don't want to block in the sysctl, since it still holds the global funnel in older versions of the OS. The rule with sysctl is you get in, you do your work, and you get out, quickly as possible.
-- Terry _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden