Re: Debugging a Kernel Panic
Re: Debugging a Kernel Panic
- Subject: Re: Debugging a Kernel Panic
- From: Terry Lambert <email@hidden>
- Date: Tue, 27 Feb 2007 17:22:10 -0800
On Feb 27, 2007, at 3:01 AM, Jones Curtis wrote:
I have a customer who is experiencing kernel panics that are likely
caused (at least partially) by my kernel extension. Thankfully, this
is a fairly technically proficient customer, with multiple machines,
and a desire to learn about remote kernel debugging. Hard to beat
that. I sent him debug builds of the kernel extension and
instructions on how to get everything set up and what to do when a
panic occurs.
There is no obvious reason, circumstance or pattern to the crashes
and I absolutely can not reproduce them (although I do not have a
dual-proc G5, which is the machine of his that is crashing).
The information gdb provides is rarely anything more than this:
(gdb) bt
#0 0x000ab7f8 in kernelStackUnaligned ()
Cannot access memory at address 0x44097f00
Cannot access memory at address 0x44097f00
I'm not sure how to proceed. Can anyone suggest any methods for
getting gdb to give me more helpful information? Also, there isn't
exactly an abundance of information (on Google) regarding
kernelStackUnaligned(); perhaps it is indicative of a certain type
of problem that might help with debugging? Any help appreciated....
That's for the exception thread that led to the panic in the trap
handler.
The exception value is in R3, and the saved context is in R4. So the
output of "info regs" could be helpful.
Also, the panic information, particularly the DAR and PC at the time
of the panic; this can be obtained by using the "paniclog" command
from "kgmacros".
There are a bunch of things that could be the source of your problem,
and I can't enumerate them all here.
A really common error of this type is to pass a stack variable to
someone, they hold onto it, and then they do things to it after you'd
already run up and down the stack, and your return address and other
saved information gets messed up. So check everywhere you pass
something to a function that might squirrel it away, or it ends up
being given to another thread to scribble on.
If you are writing assembly code functions, then it's entirely
possible that the stack is in fact not aligned. If so, it can be
helpful to write a small amount of C code that has the same parameters
and return values as your assembly function, and use cc -S to compile
it to assembly source code, and compare that vs. your function. Make
sure that if you reserve stack space, that you use a multiple of 16,
e.g.:
...
subi r1,r1,0x20 ; reserve some stack space
lwz r3,0x0(%0) ; put first parameter to function on stack
bl _my_function ; call a C function
addi r1,r1,0x20 ; put the stack back
...
Another common scenario is declaring huge auto variables on the stack,
and expecting there to be an infinite amount of it; kernel stacks are
relatively very small:
void
foofunc(void)
{
char honking_big_buffer[32 * 1024]; /* too big! */
...
}
If you are using the "kgmacros" that ship with the "kernel debug kits"
on <http://developer.apple.com/sdk/>, and are set up for two machine
debugging, as described there, then you can do a "showallstacks".
This will get you a list of all the stacks that there are, and for
each thread not currently in a continuation, you'll see values for
reserved_stack, kernel_stack, stacktop, and stackbottom, which will,
with a little math, let you know what at least part of the callstack
looks like for the problem process.
The above should get you a lot more information, although it may not
get you to your problem.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden