• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Understanding cores...
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Understanding cores...


  • Subject: Re: Understanding cores...
  • From: Andreas Fink <email@hidden>
  • Date: Thu, 11 Jan 2007 08:05:20 +0100


On 09.01.2007, at 19:28, Derek Kumar wrote:

On Jan 8, 2007, at 4:05 PM, Michael Tuexen wrote:

Well on that system the NKE is load always, because it is required by the application running on that
Mac Pro. In the meantime another system was setup (on a different hardware) the it also crashes a lot.
BTW: A lot means a couple of times per day. And yes, the cores we get (using a core dump server),
are all like this (some pointed to bugs in the NKE in the past, but these could be fixed, then also
SCTP.kext was explicitly mentioned in the paniclog).


Any idea how to narrow down the problem?
If the EIP values in the "paniclog" register dump are identical/ similar across all the crashes you've observed (note that "unresolved kernel trap" is just a generic label), and only occur when your driver is loaded, it's likely to be memory corruption as I noted previously. Is it always an EBP based access (typically a local or parameter) in idle_thread() that's causing the fault? The loop in idle_thread() briefly enables interrupts and disables them, so if you have an interrupt filter routine (that executes at interrupt context) that could be another point where corruption could occur (in addition to the the saved context at the base of the thread's kernel stack I mentioned previously--corruption of the register context below the interrupt stack frame that contains the saved value of the EBP register, for instance).
Unfortunately, there's no single magic bullet when it comes to identifying sources of memory corruption of this type--determining the patterns and location of corruption and binary search via logging/tracing is one approach (after carefully walking through your code to look for erroneous stores to memory, bad DMA bounds, stack overflows etc.; I don't think page protection/debug register type schemes to trap the bad store (assuming it's not a physical mode store) would be useful here since the register context would be very frequently accessed. Logic analyzers (very expensive) would be a last resort). The kernel trace facility (/usr/local/bin/trace - h) can tell you what events (such as interrupts and context switches) occurred on that processor, but given that it panics, you'd probably have to examine the trace buffer in memory (see xnu/ bsd/kern/kdebug.c in the kernel sources for the internals of the trace facility) to extract the last few trace events.


Derek


Hello Derek,

We now had the same crash on a XServe dual G5 which looks like the one below.
We think to have spotted the real culprit in the meantime but its like fishing in the dark and wild guessing and hoping to have it fixed. But as we see several crashdumps a day from different machines, we will know pretty soon if we found it or not. But we want to be sure ;-). So is there anything useable we can follow on this gdb output which could give us some hints?


As iPhone is there, having SCTP under MacOS X is even more a necessarity and this NKE is the one closed to production quality with the exception of one single very nasty bug we are tracking down. Then we are off to do the same under Leopard. (oh by the way, where can we checkout the sources of current leopard beta so we can verify our design?).



-----SNIP----
xg5:/Users/afink root# gdb -c core-xnu-792-85.195.192.42-b7c6afb3-GOOD
GNU gdb 6.3.50-20050815 (Apple version gdb-573) (Fri Oct 20 15:54:33 GMT 2006)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc-apple-darwin".
unable to read unknown load command 0x7a000000
#0 0x00000000 in ?? ()
(gdb) source KernelDebugKit/kgmacros
KernelDebugKit/kgmacros:26: Error in sourced command file:
No symbol table is loaded. Use the "file" command.
(gdb) symbol-file KernelDebugKit/mach_kernel
Reading symbols from /Users/afink/KernelDebugKit/mach_kernel...done.
(gdb) source KernelDebugKit/kgmacros
Cannot access memory at address 0x0
Cannot access memory at address 0x0
Loading Kernel GDB Macros package. Type "help kgm" for more info.
(gdb) add-symbol-file org.sctp.nke.sctp.sym-GOOD
add symbol table from file "org.sctp.nke.sctp.sym-GOOD"? (y or n) y
Reading symbols from /Users/afink/org.sctp.nke.sctp.sym-GOOD...done.
(gdb) paniclog
Cannot access memory at address 0x0
Cannot access memory at address 0x0



Unresolved kernel trap(cpu 1): 0x400 - Inst access DAR=0x0000000003D67000 PC=0x0000000000000000
Latest crash info for cpu 1:
Exception state (sv=0x44BC6A00)
PC=0x00000000; MSR=0x40001030; DAR=0x03D67000; DSISR=0x40000000; LR=0x00000000; R1=0x2CA73DE0; XCP=0x00000010 (0x400 - Inst access)
Backtrace:
0x0379AA8C 0x00023900 0x000ABAAC 0x00000000
backtrace terminated - frame not mapped or invalid: 0xBFFF70A0


Proceeding back via exception chain:
Exception state (sv=0x44BC6A00)
previously dumped as "Latest" state. skipping...
Exception state (sv=0x44AB0280)
PC=0x9004A3B8; MSR=0x0000F030; DAR=0x03D67000; DSISR=0x42000000; LR=0x9004A2FC; R1=0xBFFF70A0; XCP=0x00000030 (0xC00 - System call)


Kernel version:
Darwin Kernel Version 8.8.0: Fri Sep 8 17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC
panic(cpu 0 caller 0xFFFF0004): wait queue deadlock - wq=0x2e4e900, cpu=0


Latest stack backtrace for cpu 0:
Backtrace:
0x00095138 0x00095650 0x00026898 0x0003EF98 0x00022CC0 0x000ABAAC 0x00000000
Proceeding back via exception chain:
Exception state (sv=0x44C11780)
PC=0x9000AB48; MSR=0x0000D030; DAR=0x03D63000; DSISR=0x42000000; LR=0x9000AA9C; R1=0xBFFFEC00; XCP=0x00000030 (0xC00 - System call)


Kernel version:
Darwin Kernel Version 8.8.0: Fri Sep 8 17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC
(gdb)



for CPU 1

(gdb) list *0x000ABAAC
No source file for address 0xabaac.
(gdb) list *0x0379AA8C
No source file for address 0x379aa8c.
(gdb) list *0x00023900
0x23900 is in mach_msg_overwrite_trap (/SourceCache/xnu/xnu-792.12.6/ osfmk/ipc/mach_msg.c:1862).
1857 in /SourceCache/xnu/xnu-792.12.6/osfmk/ipc/mach_msg.c
(gdb) list *0x000ABAAC
No source file for address 0xabaac.
(gdb)





for CPU 0

(gdb) list *0x00095138
0x95138 is in print_backtrace (/SourceCache/xnu/xnu-792.12.6/osfmk/ ppc/model_dep.c:403).
398 /SourceCache/xnu/xnu-792.12.6/osfmk/ppc/model_dep.c: No such file or directory.
in /SourceCache/xnu/xnu-792.12.6/osfmk/ppc/model_dep.c
(gdb) list *0x00095650
0x95650 is in Debugger (/SourceCache/xnu/xnu-792.12.6/osfmk/ppc/ model_dep.c:557).
552 in /SourceCache/xnu/xnu-792.12.6/osfmk/ppc/model_dep.c
(gdb) list *0x00026898
0x26898 is in panic (/SourceCache/xnu/xnu-792.12.6/osfmk/kern/debug.c: 206).
201 /SourceCache/xnu/xnu-792.12.6/osfmk/kern/debug.c: No such file or directory.
in /SourceCache/xnu/xnu-792.12.6/osfmk/kern/debug.c
(gdb) list *0x0003EF98
0x3ef98 is in wait_queue_peek64_locked (/SourceCache/xnu/xnu-792.12.6/ osfmk/kern/wait_queue.c:1284).
1279 /SourceCache/xnu/xnu-792.12.6/osfmk/kern/wait_queue.c: No such file or directory.
in /SourceCache/xnu/xnu-792.12.6/osfmk/kern/wait_queue.c
(gdb) list *0x00022CC0
0x22cc0 is in mach_msg_overwrite_trap (/SourceCache/xnu/xnu-792.12.6/ osfmk/ipc/mach_msg.c:1110).
1105 /SourceCache/xnu/xnu-792.12.6/osfmk/ipc/mach_msg.c: No such file or directory.
in /SourceCache/xnu/xnu-792.12.6/osfmk/ipc/mach_msg.c
(gdb) list *0x000ABAAC
No source file for address 0xabaac.
(gdb)



_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
  • Follow-Ups:
    • Re: Understanding cores...
      • From: Terry Lambert <email@hidden>
References: 
 >Understanding cores... (From: Michael Tuexen <email@hidden>)
 >Re: Understanding cores... (From: "Brian Bechtel" <email@hidden>)
 >Re: Understanding cores... (From: Derek Kumar <email@hidden>)
 >Re: Understanding cores... (From: Michael Tuexen <email@hidden>)
 >Re: Understanding cores... (From: Derek Kumar <email@hidden>)

  • Prev by Date: Re: TCP, and clamping MSS..?
  • Next by Date: kevent problems at exit
  • Previous by thread: Re: Understanding cores...
  • Next by thread: Re: Understanding cores...
  • Index(es):
    • Date
    • Thread