Re: Questions about debugging kernel panics
Re: Questions about debugging kernel panics
- Subject: Re: Questions about debugging kernel panics
- From: Terry Lambert <email@hidden>
- Date: Tue, 17 Jan 2006 13:35:56 -0800
I think Garth pointing you at the technote was the right thing to do;
you still have a couple of questions here that it probably doesn't
answer...
On Jan 17, 2006, at 12:21 PM, James Reynolds wrote:
[ ... ]
So if I have a bunch of panic logs from different computers and they
all have the same PC numbers, but different DAR numbers, what does
that mean?
It means you are crashing at the same location in your code as a
result of dereferencing unallocated memory, and that the unallocated
memory address is changing. This problem is going to be a bug in your
software; either use of an uninitialized stack variable, or reuse of
memory after it has been freed.
How is moving to Intel going to change CPU panics? Will trap
numbers stay the same?
No. The trap numbers are the numbers that are assigned as exceptional
conditions by the processor manufacturer; they are magic hex values
that are meaningful to the hardware, and can change between chip
revisions (though they usually don't). The most common Intel panic
trap will probably be 14, which is the same as a 0x300 on PPC -
basically, a page not present error, for which there is no backing
store, so the kernel cannot fix-up the error by loading the page back
in from backing store, and restarting your code at the faulting
instruction for you (this is normally how demand paging works, so
exceptions are a normal part of a running system; panics occur when an
exception can't be fixed up, and the system state is unrecoverable).
Most of these will be driver bugs, or even bugs in locally installed
non-driver KEXTs.
There is a bit of documentation about debugging the FreeBSD kernel
on the web and I'm wondering how similar it is to Darwin. (for
example:
http://www.onlamp.com/pub/a/bsd/2002/04/04/Big_Scary_Daemons.html
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-trouble.html)
There will be some similarities. The trap numbers will be the same,
and will generally have the same root causes. The control registers
will be the same; what we call the PC on PPC will be the EIP on Intel,
and so on. The FreeBSD information on register decoding is mostly
useful, since what you're talking about is machine state at the time
of the problem, and where in memory you need to look to find the bogus
instruction stream and/or data that resulted in the crash.
In particular, I read that a core dump is the size of your physical
RAM. Does Darwin do the same?
FreeBSD does system dumps to the swap partition. These dumps survive
reboot because of their location on persistent storage. They do so at
the cost of a dedicated swap partition. The drivers go out of their
way in a panic situation to keep the corrupt kernel data, which is
what caused the panic in the first place, from scribbling anywhere
else on the disk.
Darwin supports dumping to a remote dump server (how to set one up,
etc., is all documented, both in the list archives for this mailing
list, and at developer.apple.com), but since swap occurs to files that
are in the ordinary file system, and are bounded by physical boundary
limits, it does not support dumping to the swap area - that would be
too dangerous, since you are, after all, talking about a kernel with
sufficiently corrupt data that you can't safely recover from the
failure, or you wouldn't be panic'ing in the first place. This
corruption may or may not include the location of the swap area.
It's a rare occurrence, but still not statistically insignificant,
where a FreeBSD kernel will dump over top of important data because of
the corruption of the memory containing the swap boundaries used by
the dump drivers.
Practically, all you will get from a FreeBSD or Darwin dump is kernel
information, not including swappable data, which means you will lose
user process information and you will lose any swappable kernel pages
that have been swapped out as well (allocated pageable kernel memory).
I also read that core dumps may contain passwords that were stored
in RAM. Is that true of Darwin?
It will contain whatever was in kernel memory at the time. This could
include buffer cache information, which may include the contents of
sensitive files, such as your password file. It could also include
various swap file contents, which could include unencrypted data from
your keychain, etc.. This is one of several reasons system dumps are
disabled by default, and require setup work by a privileged user to
make them operate. It's also one of the reasons we support encrypted
swap.
[ ... ]
Does Darwin support saving core dumps to the local hard disk at
all (not that I want it, just curious)?
No, but if you needed to be able to do it for research or other
purposes, the necessary support code would not be difficult for you to
hack into the current remote dump code.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden