Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Questions about debugging kernel panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about debugging kernel panics

Subject: Re: Questions about debugging kernel panics
From: Terry Lambert <email@hidden>
Date: Tue, 17 Jan 2006 13:35:56 -0800

I think Garth pointing you at the technote was the right thing to do; you still have a couple of questions here that it probably doesn't answer...

On Jan 17, 2006, at 12:21 PM, James Reynolds wrote:

[ ... ] So if I have a bunch of panic logs from different computers and they all have the same PC numbers, but different DAR numbers, what does that mean?

It means you are crashing at the same location in your code as a result of dereferencing unallocated memory, and that the unallocated memory address is changing. This problem is going to be a bug in your software; either use of an uninitialized stack variable, or reuse of memory after it has been freed.

How is moving to Intel going to change CPU panics? Will trap numbers stay the same?

No. The trap numbers are the numbers that are assigned as exceptional conditions by the processor manufacturer; they are magic hex values that are meaningful to the hardware, and can change between chip revisions (though they usually don't). The most common Intel panic trap will probably be 14, which is the same as a 0x300 on PPC - basically, a page not present error, for which there is no backing store, so the kernel cannot fix-up the error by loading the page back in from backing store, and restarting your code at the faulting instruction for you (this is normally how demand paging works, so exceptions are a normal part of a running system; panics occur when an exception can't be fixed up, and the system state is unrecoverable). Most of these will be driver bugs, or even bugs in locally installed non-driver KEXTs.

There is a bit of documentation about debugging the FreeBSD kernel on the web and I'm wondering how similar it is to Darwin. (for example: http://www.onlamp.com/pub/a/bsd/2002/04/04/Big_Scary_Daemons.html http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-trouble.html)

There will be some similarities. The trap numbers will be the same, and will generally have the same root causes. The control registers will be the same; what we call the PC on PPC will be the EIP on Intel, and so on. The FreeBSD information on register decoding is mostly useful, since what you're talking about is machine state at the time of the problem, and where in memory you need to look to find the bogus instruction stream and/or data that resulted in the crash.

In particular, I read that a core dump is the size of your physical RAM. Does Darwin do the same?

FreeBSD does system dumps to the swap partition. These dumps survive reboot because of their location on persistent storage. They do so at the cost of a dedicated swap partition. The drivers go out of their way in a panic situation to keep the corrupt kernel data, which is what caused the panic in the first place, from scribbling anywhere else on the disk.

Darwin supports dumping to a remote dump server (how to set one up, etc., is all documented, both in the list archives for this mailing list, and at developer.apple.com), but since swap occurs to files that are in the ordinary file system, and are bounded by physical boundary limits, it does not support dumping to the swap area - that would be too dangerous, since you are, after all, talking about a kernel with sufficiently corrupt data that you can't safely recover from the failure, or you wouldn't be panic'ing in the first place. This corruption may or may not include the location of the swap area.

It's a rare occurrence, but still not statistically insignificant, where a FreeBSD kernel will dump over top of important data because of the corruption of the memory containing the swap boundaries used by the dump drivers.

Practically, all you will get from a FreeBSD or Darwin dump is kernel information, not including swappable data, which means you will lose user process information and you will lose any swappable kernel pages that have been swapped out as well (allocated pageable kernel memory).

I also read that core dumps may contain passwords that were stored in RAM. Is that true of Darwin?

It will contain whatever was in kernel memory at the time. This could include buffer cache information, which may include the contents of sensitive files, such as your password file. It could also include various swap file contents, which could include unencrypted data from your keychain, etc.. This is one of several reasons system dumps are disabled by default, and require setup work by a privileged user to make them operate. It's also one of the reasons we support encrypted swap.

[ ... ] Does Darwin support saving core dumps to the local hard disk at all (not that I want it, just curious)?

No, but if you needed to be able to do it for research or other purposes, the necessary support code would not be difficult for you to hack into the current remote dump code.

-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Questions about debugging kernel panics (From: James Reynolds <email@hidden>)




Prev by Date:
Re: Questions about debugging kernel panics

Next by Date:
Re: Questions about debugging kernel panics

Previous by thread:
Re: Questions about debugging kernel panics

Next by thread:
Is this chain of calls possible?

Index(es):

Date
Thread