Re: Questions about debugging kernel panics
Re: Questions about debugging kernel panics
- Subject: Re: Questions about debugging kernel panics
- From: James Reynolds <email@hidden>
- Date: Tue, 17 Jan 2006 15:38:55 -0700
What, precisely are you trying to figure out?
I want to figure out why the panic occurs, and hopefully how to stop
it. I manage a lab of about 300 Macs and I see this panic often
("this" meaning where the PC values and sometimes backtraces are the
same). It happens maybe 10 times a month?... I haven't counted yet.
I would like to stop it.
I see other panics that look similar to each other that occur often
also (2-5 times a month) and so I'm hoping that I can learn how to
debug them as well, without having to ask for step-by-step help from
this list. ;)
And my boss wants me to make a webpage to help others do this too
because some people at other universities have expressed an interest.
I guess being able to debug a kernel is serious bragging rights for a
system administrator...
As the panics are repetitive on different machines, I'm assuming that
they are because of bugs, rather than bad hardware.
But I'm not sure if the bug is in a 3rd party extension or a kernel
bug. I guess that is what I'm trying to figure out.
And I'm curious how much the log tells me.
I'm thinking now that I am going to set up a permanent panic dump
server to store core dumps so I can get much more information and not
need to rely on reading the Egyptian panic logs.
I aim to squash all kernel bugs and have panic free computers! :)
This particular panic was, as it states, due to an alignment error.
The instruction in question is likely trying to do a word-aligned
operation and the
address it's trying to read from, 0x1bcbeae, certainly isn't word-aligned.
For example, the MPC7450 user's manual gives these possible causes:
........
Similar causes would apply to other PPC CPUs.
The question then becomes WHY the code in question is trying to
operate on an unaligned
address, and it's usually either because of a software error or
because something has stomped
on the memory location or register that contained the address.
So to figure out what code is doing this, I should probably get a
core dump, correct? I can't figure that out from the log, right?
Oh, I remember why I had the question about the PC value. I remember
reading somewhere that you can map some addresses in the log to code,
and that webpage didn't have PC values, and the PC value seemed like
the closest thing (I think it is, I'll have to re-read that page).
Has anyone noticed there is no Darwin kernel panic webpages that are
targeted for people between developer and end user? Webpages are
either in the camp that say swap RAM, or they are in the camp that
says launch gdb.
James
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden