site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com Andrew Gallatin writes:
Godfrey van der Linden writes:
I'd like to see the assembly that this kernel is running at 0x2CAE8. I'd be willing to bet an offset from a NULL pointer is being taken and that is why you are panicing.
I think the 'r1' panic is a red herring, the first exception state is 'PC=0x0002CAE8; MSR=0x00001030; DAR=0x000000D4; DSISR=0x40000000; LR=0x0002 CAD8; R1=0x0CC33DB0; XCP=0x0000000C (0x300 - Data access)' Indicates that the r1 is valid at the time that the panic is taken. Do you have a symbolled kernel for the version that is taking the panic. If you can find out what routine was passed a NULL pointer you may have a suspect.
I've just found this crash waiting for me again this morning, and I managed to connect to it via remote gdb. This is with 10.3.7, though the PC is the same as it was for the much earlier version of MacOSX. The code seems to be in a totally different spot, which means maybe I totally botched converting the address -> symbols last time. (gdb) paniclog Unresolved kernel trap(cpu 1): 0x300 - Data access DAR=0x00000000000000D4 PC=0x000000000002CAE8 Latest crash info for cpu 1: Exception state (sv=0x22D43000) PC=0x0002CAE8; MSR=0x00001030; DAR=0x000000D4; DSISR=0x40000000; LR=0x0002CAD8; R1=0x0CC5BDB0; XCP=0x0000000C (0x300 - Data access) Backtrace: 0x0002CAD8 0x0002C8A8 0x0002C870 Proceeding back via exception chain: Exception state (sv=0x22D43000) previously dumped as "Latest" state. skipping... Exception state (sv=0x00A01500) PC=0x00000000; MSR=0x0000D030; DAR=0x00000000; DSISR=0x00000000; LR=0x00000000; R1=0x00000000; XCP=0x00000000 (Unknown) Kernel version: Darwin Kernel Version 7.7.0: Sun Nov 7 16:06:51 PST 2004; root:xnu/xnu-517.9.5.obj~1/RELEASE_PPC [ CPU 0 then panics because a simple lock acquisition times out] The symbolic bactrace is: (gdb) bt #0 0x0002cae8 in do_thread_scan () at /SourceCache/xnu/xnu-517.9.5/osfmk/kern/sched_prim.c:2790 #1 0x0002c8a8 in sched_tick_thread_continue () at /SourceCache/xnu/xnu-517.9.5/osfmk/kern/sched_prim.c:2671 Frame 0 works out to be: thread = processor->idle_thread; if (thread->sched_stamp != sched_tick) { if (stuck_count == MAX_STUCK_THREADS) { restart_needed = TRUE; break; } And 0xd4 is the offset of sched_stamp: (gdb) p ((thread_t) 0)->sched_stamp Cannot access memory at address 0xd4 So it looks like the idle thread got zeroed out somehow. This seems to be confirmed by other sources: (gdb) info locals restart_needed = 9012778 thread = 0x0 pset = 0x331800 processor = 0x1239948 s = 1 The processor struct looks like this. (gdb) p/x *processor $9 = {processor_queue = {next = 0x337478, prev = 0x331814}, state = 0x33705c, active_thread = 0x0, next_thread = 0x0, idle_thread = 0x0, processor_set = 0x1239948, current_pri = 0x80000000, quantum_timer = {q_link = { next = 0x0, prev = 0x0}, func = 0x0, param0 = 0x0, param1 = 0x0, deadline = 0x0, state = 0x266450}, quantum_end = 0x0, last_dispatch = 0xcb18000, timeslice = 0x104, deadline = 0x2400000080, runq = {highq = 0x80, bitmap = {0x5f, <...> This looks corrupt, and looking at the pset the processor address seems funny. Eg: (gdb) p *pset $13 = {idle_queue = {next = 0x331800, prev = 0x331800}, idle_count = 0, active_queue = {next = 0x337000, prev = 0x33748c}, processors = {next = 0x1239948, prev = 0x33748c}, processor_count = 2, sched_lock = {lock_data = 19103745}, runq = {highq = 0, bitmap = {0, 0, 0, 1}, count = 0, urgency = 0, queues = {{ next = 0x331840, prev = 0x331840}, {next = 0x331848, prev = 0x331848}, { <...> Eg, all the addresses here are in the 0x33xxxx range, so processor = 0x1239948 seems rather suspicious. The other one looks more reasonable: $15 = (struct processor *) 0x33748c (gdb) p *(processor_t)0x33748c $16 = {processor_queue = {next = 0x33180c, prev = 0x337000}, state = 1, active_thread = 0x1238000, next_thread = 0x0, idle_thread = 0x1238948, processor_set = 0x331800, current_pri = 95, quantum_timer = {q_link = { next = 0x135ad6c, prev = 0x320ed0}, func = 0x28738 <thread_quantum_expire>, param0 = 0x33748c, param1 = 0x1238000, deadline = 37553127975012, state = DELAYED}, quantum_end = 37553127975012, last_dispatch = 37553127641687, timeslice = 1, deadline = 18446744073709551615, runq = {highq = 0, bitmap = {0, 0, <...> But that 0x1238948 address is showing up here too (for the idle thread). I have no idea what is supposed to be there. Dumping the contents don't mean much to me: (gdb) x/32 0x1238948 0x1238948 <mhp.0+15366720>: 0x00000000 0x00000000 0x00000000 0x00000000 0x1238958 <mhp.0+15366736>: 0x00000000 0x00000000 0x01238948 0x80000000 0x1238968 <mhp.0+15366752>: 0x00000000 0x00000000 0x00000000 0x00000000 0x1238978 <mhp.0+15366768>: 0x00000008 0xffffffff 0x00000000 0x0002c490 0x1238988 <mhp.0+15366784>: 0x00000000 0x00000000 0x00000000 0x0cd30000 0x1238998 <mhp.0+15366800>: 0x00000184 0x00000004 0x00000000 0x00000000 0x12389a8 <mhp.0+15366816>: 0x0000005f 0x00000051 0x00000000 0x00000000 0x12389b8 <mhp.0+15366832>: 0x00000000 0x00000000 0x0000000e 0x00000000 If anybody has any ideas, I'll leave this in gdb for a while.. Thanks, Drew _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... This email sent to site_archiver@lists.apple.com