Re: kernel lockup

19 Apr 2013


      On 19.04.2013, at 22:31, Pratima Kudale <Pratima.Kudale@harmonicinc.com> wrote:
...
Andreas,
Are you seeing this issue only on 10.8.3? Or is it reproducible on any 10.8 version?
so far only on 10.8.3. I have not tried on older version.
...
I am sharing my experience here. We are also running into n/w stack hang issue, I have  bug 13138492 open for it.
I'm running into this when it panics:
however most of the time it does simply freeze and lock up.

(gdb) backtrace
#0  Debugger (message=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/osfmk/i386/AT386/model_dep.c:916
#1  0xffffff800901d626 in panic (str=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/osfmk/kern/debug.c:336
#2  0xffffff800914e542 in sa_copy () at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:465
#3  0xffffff800915196a in rt_lookup (lookup_only=<value temporarily unavailable, due to optimizations>, dst=0xffffff80c949bcdc, netmask=<value temporarily unavailable, due to optimizations>, rnh=<value temporarily unavailable, due to optimizations>, ifscope=0) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:2681
#4  0xffffff800914e87c in rtalloc1_common_locked (dst=0xffffff80c949bcdc, report=1, ignflags=0, ifscope=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:876
#5  0xffffff800914e7c2 in rtalloc_ign_common_locked (ro=0xffffff80c949bcd0, ignore=0, ifscope=0) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:817
#6  0xffffff800914e635 in rtalloc_ign (ro=<value temporarily unavailable, due to optimizations>, ignore=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:829
#7  0xffffff7f8967cc10 in ?? ()
#8  0xffffff7f896804bb in ?? ()
#9  0xffffff7f8964ad4e in ?? ()
#10 0xffffff7f8963eed3 in ?? ()
#11 0xffffff7f8963b0e9 in ?? ()
#12 0xffffff7f8963927a in ?? ()
#13 0xffffff7f8963e147 in ?? ()
#14 0xffffff7f8963e1b0 in ?? ()
#15 0xffffff80091ca28f in ip_proto_dispatch_in (m=0xffffff8a88c81c00, hlen=<value temporarily unavailable, due to optimizations>, proto=<value temporarily unavailable, due to optimizations>, inject_ipfref=0xffffff800914eaaf) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:663
#16 0xffffff80091ca510 in ip_input (m=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:777
#17 0xffffff80091ca0ed in ip_proto_input (protocol=<value temporarily unavailable, due to optimizations>, packet_list=0x0) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:553
#18 0xffffff800915e26c in proto_input (protocol=<value temporarily unavailable, due to optimizations>, packet_list=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/kpi_protocol.c:290
#19 0xffffff80091405fb in ether_inet_input (ifp=<value temporarily unavailable, due to optimizations>, protocol_family=<value temporarily unavailable, due to optimizations>, m_list=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/ether_inet_pr_module.c:220
#20 0xffffff800913e7a1 in dlil_ifproto_input (ifproto=0xffffff80b9fd2ce0, m=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2717
#21 0xffffff80091389cc in dlil_input_packet_list_common (ifp_param=0x0, m=0xffffff8a88c81c00, cnt=<value temporarily unavailable, due to optimizations>, mode=<value temporarily unavailable, due to optimizations>, ext=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2955
#22 0xffffff800913f4b9 in dlil_input_thread_func (v=0xffffff80ba99e200, w=-2000151552) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2823
(gdb)
...
And I also posted kernel stack trace for hang on this mailing list in January:
http://prod.lists.apple.com/archives/darwin-kernel/2013/Jan/msg00007.html
If in case you are running into similar issue:  Setting net.link.generic.system.flow_advisory to zero helped to get rid of hang.
But it affects n/w performance significantly.  Hence, we are still waiting for actual solution from apple on this.
See if this helps you. Please update once you receive any solution / workaround for the issue that you are experiencing here.
Thanks,
Pratima
-----Original Message-----
From: darwin-kernel-bounces+pratima.kudale=harmonicinc.com@lists.apple.com [mailto:darwin-kernel-bounces+pratima.kudale=harmonicinc.com@lists.apple.com] On Behalf Of Steven Bytnar
Sent: Friday, April 19, 2013 1:22 PM
To: Andreas Fink
Cc: darwin-kernel@lists.apple.com
Subject: Re: kernel lockup
Hi,
Instead of a full core dump, how about a summary of the core dump?
This requires the kernel debug kit, but this used to be be a pretty good summary of what the machine was doing at the time of a panic. I used this with 10.5 to troubleshoot some third party software. It might need to be updated for 10.8.
$ cat pd.sh
echo Start:
date
echo Working on $1
gdb -c $1 -x pd.gdb > $1.txt
echo End:
date
$ cat pd.gdb
add-symbol-file /Volumes/KernelDebugKit/mach_kernel
source /Volumes/KernelDebugKit/kgmacros
showallstacks
showallthreads
showalltasks
showcurrentthreads
showcurrentstacks
showallvm
zprint
quit
$ ./pd.sh {core-file-name}
--Steve
On Fri, Apr 19, 2013 at 10:02:39PM +0200, Andreas Fink wrote:
...
did that. [1]radar://13696346
  Unfortunately the kernel coredump is too big to upload (several
  gigabytes).
  And now it dumps even after the reboot sometimes.
  On 18.04.2013, at 17:51, Shantonu Sen <[2]ssen@apple.com> wrote:
You can use FireWire KDP if the Ethernet interfaces stop working (see
    fwkdp(1) or the tech note on this) to attach to the kernel debugger and
    take a core dump. Depending on the exact issue, Ethernet may work for
    KDP even if the OS IP stack gets sad. The core dump should indicate the
    culprit, especially if you start with a proximal symptom such as a
    hanging process and trace the dependency change of resources or locks.
    Please file a Radar with the coredump
    Shantonu
    On Apr 18, 2013, at 7:20 AM, Andreas Fink <[3]afink@list.fink.org>
    wrote:
Hi Folks,
I'm running into some kernel related deadlocks here under 10.8.3 which
      I can not really figure out where to look further.
      We have the following setup:
XServe with two ethernets.
      en0   private IP's
      en1 public IPs.
on en1 we have several 100's of open tcp sessions at times and thats
      where all traffic comes in and gets processed (its SMPP protocol)
      The traffic is answered inside our application and processed and put
      into a MySQL database (which is connected over en0).
      a couple of hours later, the system "locks up". Now what really
      happens is the following:
a) you can no longer ping en1, nor does any sockets still work on it.
      b) you can still ping en0
      c) on en0, established sessions still work, however opening a new ssh
      session for example doesn't work.
      d) typing commands in a still working session most of the time locks
      up the system. for example "killall myapp" doesn't do nothing and just
      stalls.
      e) syslog doesnt show anything spurious.
      f) my app is still in memory and runs fine
      g) "top" was showing little CPU load, plenty of free memory. All looks
      normal.
      h) netstat -m was not showing any dangerous buffer overflowing.
      i) an established remote desktop session gets killed
      j) The appplication doesn't crash,
      h) The kernel doesn't panic.
I was able to run a tcpdump on the interface while this was happening
      and what I see towards the end is that out of a sudden tcp
      retransmissions start to pile up. We see lots and lots of them out of
      the blue.
      In other words, the kernel seems to stop processing the packets
      somehow and doesn't acknowledge it to the remote anymore. Also
      incoming acknowledgments don't get processed.
      A few seconds later you can't do nothing with the machine anymore and
      you have to force reboot it over LOM (I praise Apple for implementing
      LOM into their XServers, even though it has its issues too).
It is obvious that the application/traffic somehow manages to saturate
      some kernel resource which makes that specific ethernet interface
      being locked up with a side effect on to the whole kernel (like not be
      able to load any binaries not in memory already).
I'm a bit lost to where look further to analyze this issue.
      Does anyone on this list might have a hint what could happen here?
_______________________________________________
      Do not post admin requests to the list. They will be ignored.
      Darwin-kernel mailing list      ([4]Darwin-kernel@lists.apple.com)
      Help/Unsubscribe/Update your Subscription:
[5]https://lists.apple.com/mailman/options/darwin-kernel/ssen%40apple.
com
This email sent to [6]ssen@apple.com
Links:
1. file:///var/folders/Jw/JwJJw00g2Ra53k+1Ynt6pU+++TM/-Tmp-//radar://
2. mailto:ssen@apple.com/
3. mailto:afink@list.fink.org/
4. mailto:Darwin-kernel@lists.apple.com/
5.
https://lists.apple.com/mailman/options/darwin-kernel/ssen%40apple.com
6. mailto:ssen@apple.com/
...
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/darwin-kernel/smbapplelists%40
bytnar.net
This email sent to smbapplelists@bytnar.net
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/darwin-kernel/pratima.kudale%40harmo...
This email sent to pratima.kudale@harmonicinc.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists....

This email sent to site_archiver@lists.apple.com

Andreas Fink

tags

participants (1)