Hello Everybody, I apologize for the length of this post, but I hoped to provide enough info to give somebody an "aha!" moment. Our product consists of user and kernel components implementing a VPN client. The kernel component is a KEXT that processes inbound and outbound IP packets intercepted by a DLIL. The DLIL is pushed onto every network device that becomes active. The crash we are seeing is a kernel panic that occurs only on multiple-CPU hosts, and is infrequent. Our KEXT tries to allocate some kernel memory by calling "kmem_alloc()". The panic message is always "panic(cpu [0 or 1]): thread_invoke: preemption_level 1". The most reliable way to duplicate the problem is to engage the software (i.e. connect to a VPN switch via an encrypted VPN tunnel), mount a remote shared volume (AppleTalk over IP), and do heavy reads and writes to the volume. Our customers have seen the problem on both Jaguar (10.2.x) and Puma (10.1.x) OS's. Our testing and debugging has been on Jaguar hosts, 10.2.3 (Darwin 6.3). Our application is built on Puma. Various checks have been tried, none successful: 1) Don't call kmem_alloc() if preemption level is 1. It turns out that the preemption level is often 1 during normal operation, and is usually not a problem. However, not returning allocated memory every time preemption level is 1 does cause problems for our software. 2) Tried a thread_funnel_switch(). No funnel was being held by the thread at the time, so this caused its own panic. 3) Examined the size of the requested kernel block. The block requested when the panic occurs does not look unusual. The requested sizes seen were anywhere from 10 to over 1000 bytes during normal operation. The request that caused one panic was about 700 bytes. The stack trace can vary a bit, but this one is typical (and from Jaguar): 0x000856cc print_backtrace+176 0x00085afc Debugger+108 0x000287a8 panic+488 0x00033eec thread_invoke+72 0x000344d0 thread_block_reason+212 0x0008d51c mlInUse+16 0x00060b60 vm_fault_wire_fast+284 0x00064f88 vm_map_wire_nested+2988 0x000651a0 vm_map_wire+120 0x00061a78 kernel_memory_allocate+600 Looking through the Darwin source, the calls appear as: - vm_fault_wire_fast - mutex(&vm_page_queue_lock) - mlInUse - mutex_lock_wait - thread_sleep_interlock - assert_wait - interlock_unlock - thread_block - thread_block_reason (continuation, AST_NONE) - thread_invoke - panic("thread_invoke: preemption_level %d\n", cpu_data[cpu_number()].preemption_level); I don't quite understand how the preemption level gets set in this case, but it looks like it is done in the "interlock_unlock" function, and that "thread_sleep_interlock" shouldn't be calling both "interlock_unlock" and "thread_block". Whether or not I'm correct, is there a workaround for this panic that can be implemented in my code? Or, if it's an OS problem, will this be fixed in a future release (or is it already fixed in 10.2.4)? Cheers, Cory _______________________________________________ darwin-kernel mailing list | darwin-kernel@lists.apple.com Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/darwin-kernel Do not post admin requests to the list. They will be ignored.
participants (1)
-
Stockhoff, Cory