Apple

On Aug 7, 2009, at 3:50 PM, Eric Ogren wrote:

Hello there -

I am working on a kernel extension that sometimes attempts to allocate memory while holding a spinlock (lck_spin_t). I've read in several postings to this list that doing so while OSMalloc can cause a kernel panic, and that developers should instead use OSMalloc_noblock() and be prepared to deal with a NULL result. However, I am seeing kernel panics even when calling the noblock variant. The panic is occurring inside zalloc_canblock() at the lock_zone() call, which is indeed trying to block. This problem is very easily reproducible with a simple kext that spawns 2 threads with the following thread routine:

void alloc_thread(void *arg) {
   lck_spin_t* mylock;
   // ... initialize lock

lck_spin_lock(mylock);

while (true) {
     void* foo = OSMalloc_noblock(8, tag); // tag is a global variable initialized in the start function
}
}

Loading the kext panics the system almost immediately with the following stack, which is the same as the stack as I come across in the occasional panics:

#2 0x0012b4c6 in panic (str=0x1 <Address 0x1 out of bounds>) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/debug.c:275
#3 0x001368fd in thread_invoke (self=0x43018b8, thread=0x39a7c80, reason=0) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/sched_prim.c:1477
#4 0x00136e92 in thread_block_reason (continuation=0x1, parameter=0x0, reason=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/sched_prim.c:1837
#5 0x00136f20 in thread_block (continuation=0x1) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/sched_prim.c:1854
#6 0x001318f1 in lck_mtx_lock_wait (lck=0x1a2a084, holder=0x3e40410) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/locks.c:601
#7 0x0019d8c1 in lck_mtx_lock () at pmap.h:176
#8 0x001433b0 in zalloc_canblock (zone=0x1a2a07c, canblock=0) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/zalloc.c:883
#9 0x0012fdc2 in kalloc_canblock (size=8, canblock=0) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/kalloc.c:289
#10 0x001301c1 in OSMalloc_noblock (size=8, tag=0x5ab8700) at /SourceCache/xnu/xnu-1228.12.14/osfmk/kern/kalloc.c:303

Am I missing something here, or is it unsafe to even call OSMalloc_noblock() while holding a spinlock? If I look at the source code for zalloc, lock/unlock_zone() is always called regardless of the canblock parameter, and that zone lock is indeed a mutex. Seems almost like the canblock parameter just means that the calling thread will not block for a long time (ie will not try to refill or garbage collect the zone if it's full), not that it will never block at all.

Any help is appreciated.

thanks,
Eric

--
Excellence in any department can be attained only by the labor of a lifetime; it is not to be purchased at a lesser price -- Samuel Johnson