Re: Kernel panics even in OSMalloc_noblock() while holding a spinlock
Re: Kernel panics even in OSMalloc_noblock() while holding a spinlock
- Subject: Re: Kernel panics even in OSMalloc_noblock() while holding a spinlock
- From: Eric Ogren <email@hidden>
- Date: Tue, 11 Aug 2009 14:12:21 -0700
Hi Terry,
Thanks for your response, although I'm not sure it really answered my question. More details are inline.
On Fri, Aug 7, 2009 at 7:54 PM, Terry Lambert
<email@hidden> wrote:
On Aug 7, 2009, at 3:50 PM, Eric Ogren wrote:
Hello there -
I am working on a kernel extension that sometimes attempts to allocate memory while holding a spinlock (lck_spin_t). I've read in several postings to this list that doing so while OSMalloc can cause a kernel panic, and that developers should instead use OSMalloc_noblock() and be prepared to deal with a NULL result. However, I am seeing kernel panics even when calling the noblock variant. The panic is occurring inside zalloc_canblock() at the lock_zone() call, which is indeed trying to block. This problem is very easily reproducible with a simple kext that spawns 2 threads with the following thread routine:
void alloc_thread(void *arg) {
lck_spin_t* mylock;
// ... initialize lock
lck_spin_lock(mylock);
while (true) {
void* foo = OSMalloc_noblock(8, tag); // tag is a global variable initialized in the start function
}
}
Loading the kext panics the system almost immediately with the following stack, which is the same as the stack as I come across in the occasional panics:
#2 0x0012b4c6 in panic (str=0x1 <Address 0x1 out of bounds>) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/debug.c:275
#3 0x001368fd in thread_invoke (self=0x43018b8, thread=0x39a7c80, reason=0) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/sched_prim.c:1477
[ ... ]
Am I missing something here, or is it unsafe to even call OSMalloc_noblock() while holding a spinlock? If I look at the source code for zalloc, lock/unlock_zone() is always called regardless of the canblock parameter, and that zone lock is indeed a mutex. Seems almost like the canblock parameter just means that the calling thread will not block for a long time (ie will not try to refill or garbage collect the zone if it's full), not that it will never block at all.
Yes. You are missing several things...
(A) You're in a tight while/true loop allocating all of kernel memory a tiny bit at a time until you exhaust it, which is known to cause a panic.
(B) An infinite loop is too long a time to hold a spinlock; holding a spinlock too long is known to cause a panic
Sure, I agree that this test program would ultimately panic the kernel for other reasons but thought it was clear from the backtrace that the panic was occurring from a thread_block() call. More on this below.
(C) You're missing your paste of at least the lines above the portion of the backtrace you quoted (which is cut off at frame #2), which would include the actual panic message and frame #1 so that we could see if the panic was related to holding a spinlock or related to you exhausting the zone of zones, or running the wrong version of Parallels or dereferencing a NULL pointer, or some other known cause of panics, or some other unknown cause of panics.
I was trying to trim parts of the stack trace just to make the message a little shorter - I didn't think the calls made after panic() were really relevant. I should have included the actual panic message instead of making people try to infer it via the callstack though.
The full trace including paniclog looks like this (this is a second run - not 100% sure why the stack is 1 frame shorter this time around)
panic(cpu 0 caller 0x001368FD): "thread_invoke: preemption_level 1\n"@/SourceCache/xnu/xnu-1228.15.4/osfmk/kern/sched_prim.c:1478
(gdb) where
#0 Debugger (message=0x8001003b <Address 0x8001003b out of bounds>) at /SourceCache/xnu/xnu-1228.15.4/osfmk/i386/AT386/model_dep.c:799
#1 0x0012b4c6 in panic (str=0x1 <Address 0x1 out of bounds>) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/debug.c:275
#2 0x001368fd in thread_invoke (self=0x3db0e40, thread=0x54038b8, reason=0) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/sched_prim.c:1477
#3 0x00136e92 in thread_block_reason (continuation=0x1, parameter=0x0, reason=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/sched_prim.c:1837
#4 0x00136f20 in thread_block (continuation=0x1) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/sched_prim.c:1854
#5 0x001318f1 in lck_mtx_lock_wait (lck=0x28ce084, holder=0x3db3410) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/locks.c:601
#6 0x0019d8c1 in lck_mtx_lock () at pmap.h:176
#7 0x001433b0 in zalloc_canblock (zone=0x28ce07c, canblock=0) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/zalloc.c:883
#8 0x0012fdc2 in kalloc_canblock (size=8, canblock=0) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/kalloc.c:289
#9 0x001301c1 in OSMalloc_noblock (size=8, tag=0x723c900) at /SourceCache/xnu/xnu-1228.15.4/osfmk/kern/kalloc.c:303
#10 0x2051c05a in alloc_func (myarg=0x3962b80) at /Users/eogren/Documents/allocer/allocer.c:26
(D) You have already been told you can use Djikstra's algorithm, in which you speculatively do an allocation before holding the spinlock, and if you use it, fine, mark it as consumed at the point you would have done your allocation, and if you don't, fine, free it after you have dropped your spinlock, no harm no foul, no allocation or free inside a spinlock
This was the first time I have ever posted to this mailing list, so I assume you are just referring to other posts in the archive when you say I have been told this.
Regardless, I was not trying to argue that calling OSMalloc_noblock() inside of a spinlock is the greatest design-- I was just trying to confirm for my sake and those of others that may browse the archives later that OSMalloc_noblock() actually may indeed block; therefore, since we cannot guarantee that no one else will try to allocate from the kalloc zones at the same time as us, it is not safe to ever call the OSMalloc family while holding a spinlock. As Brendan mentioned in his post, this is important to know when porting code from other platforms.
Thanks,
Eric
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden