Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Questions about debugging kernel panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about debugging kernel panics

Subject: Re: Questions about debugging kernel panics
From: Mike Smith <email@hidden>
Date: Wed, 18 Jan 2006 11:18:05 -0800


On Jan 17, 2006, at 1:37 PM, James Reynolds wrote:

So it has been a few months since I read that. Rereading it after having read a lot of other stuf helps a little more. But in reality, it raises more questions and answers only the one I had about what PC means.

I'm sorry that this isn't easy. Sadly, it's not just understanding the backtraces that's complicated; they're not the only barrier to entry for this particular game. The good news is that practice is all it takes, along with a willingness to keep learning.

Let's see if going over this backtrace here helps at all.

But only slightly, as I'm still not sure exactly what it means, but I think I have found another bug. In examining 4 different machine panic logs within the last month, they are very similar. Anyway,

2 machines have a near exact logs. The DAR value, R1 value, and Exception states (sv) are the only things different, but the PC and backtraces are the same:
Thu Dec 15 20:03:23 2005
Unresolved kernel trap(cpu 0): 0x600 - Alignment DAR=0x0000000001BCBEAE PC=0x00000000000A4F20

We took an alignment exception attempting to access the address 0x0000000001BCBEAE. It's only 16-bit aligned, so it was probably a 32- or 64-bit access. The instruction that tried to do this was at 0x00000000000A4F20. In addition, we took this while we were in the kernel.

Latest crash info for cpu 0: Exception state (sv=0x23E24500) PC=0x000A4F20; MSR=0x00009030; DAR=0x01BCBEAE; DSISR=0x0A000000; LR=0x000750C0; R1=0x0CF13D00; XCP=0x00000018 (0x600 - Alignment) Backtrace: 0x0002A138 0x00037960 0x00265D14 0x00265F50 0x00265E30 0x002A8494

You can convert these back into function names using gdb. Here I am doing it against the wrong kernel version:

msmith% gdb /mach_kernel
(gdb) x/i 0x0002A138
0x2a138 <ipc_task_enable+56>:   lwz     r0,88(r1)
(gdb) x/i 0x00037960
0x37960 <task_set_64bit+68>:   lwz     r2,20(r31)
(gdb)
0x37964 <task_set_64bit+72>:    lwz     r3,44(r2)
(gdb) x/i 0x00265D14
0x265d14 <waitid+260>:  addi    r3,r1,64

etc. Again, this was against the wrong kernel version, so don't take the above as gospel. If you grab the debug kernel and the Darwin sources, you can get line number information and perhaps track it down to a small fragment of code. Once you've got a likely suspect, you're getting somewhere.

Once you know what's failing, you need to look for causes. In your case here, where you see the same specific failure repeating over and over the cause is likely to be something with a deterministic consequence. You can rule out things like random memory corruption; instead you're looking for something with a small but nonzero chance of happening, often you'll be looking for a race condition or degenerate behaviour in the face of an unexpected resource shortage.

For the specific example here, you almost certainly have a bad data or function pointer. Once you've found the code in question, you need to look at which pointer(s) it's attempting to dereference, and then at who might have changed them recently. One of those is likely to be your culprit.

 = Mike


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Questions about debugging kernel panics (From: James Reynolds <email@hidden>)
  >Re: Questions about debugging kernel panics (From: Garth Cummings <email@hidden>)
  >Re: Questions about debugging kernel panics (From: James Reynolds <email@hidden>)




Prev by Date:
Re: Is this chain of calls possible?

Next by Date:
Re: Is this chain of calls possible?

Previous by thread:
Re: Questions about debugging kernel panics

Next by thread:
Re: Questions about debugging kernel panics

Index(es):

Date
Thread