On 19.04.2013, at 22:31, Pratima Kudale <Pratima.Kudale@harmonicinc.com> wrote:
Andreas,
Are you seeing this issue only on 10.8.3? Or is it reproducible on any 10.8 version?
so far only on 10.8.3. I have not tried on older version.
I am sharing my experience here. We are also running into n/w stack hang issue, I have bug 13138492 open for it.
I'm running into this when it panics: however most of the time it does simply freeze and lock up. (gdb) backtrace #0 Debugger (message=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/osfmk/i386/AT386/model_dep.c:916 #1 0xffffff800901d626 in panic (str=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/osfmk/kern/debug.c:336 #2 0xffffff800914e542 in sa_copy () at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:465 #3 0xffffff800915196a in rt_lookup (lookup_only=<value temporarily unavailable, due to optimizations>, dst=0xffffff80c949bcdc, netmask=<value temporarily unavailable, due to optimizations>, rnh=<value temporarily unavailable, due to optimizations>, ifscope=0) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:2681 #4 0xffffff800914e87c in rtalloc1_common_locked (dst=0xffffff80c949bcdc, report=1, ignflags=0, ifscope=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:876 #5 0xffffff800914e7c2 in rtalloc_ign_common_locked (ro=0xffffff80c949bcd0, ignore=0, ifscope=0) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:817 #6 0xffffff800914e635 in rtalloc_ign (ro=<value temporarily unavailable, due to optimizations>, ignore=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:829 #7 0xffffff7f8967cc10 in ?? () #8 0xffffff7f896804bb in ?? () #9 0xffffff7f8964ad4e in ?? () #10 0xffffff7f8963eed3 in ?? () #11 0xffffff7f8963b0e9 in ?? () #12 0xffffff7f8963927a in ?? () #13 0xffffff7f8963e147 in ?? () #14 0xffffff7f8963e1b0 in ?? () #15 0xffffff80091ca28f in ip_proto_dispatch_in (m=0xffffff8a88c81c00, hlen=<value temporarily unavailable, due to optimizations>, proto=<value temporarily unavailable, due to optimizations>, inject_ipfref=0xffffff800914eaaf) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:663 #16 0xffffff80091ca510 in ip_input (m=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:777 #17 0xffffff80091ca0ed in ip_proto_input (protocol=<value temporarily unavailable, due to optimizations>, packet_list=0x0) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:553 #18 0xffffff800915e26c in proto_input (protocol=<value temporarily unavailable, due to optimizations>, packet_list=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/kpi_protocol.c:290 #19 0xffffff80091405fb in ether_inet_input (ifp=<value temporarily unavailable, due to optimizations>, protocol_family=<value temporarily unavailable, due to optimizations>, m_list=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/ether_inet_pr_module.c:220 #20 0xffffff800913e7a1 in dlil_ifproto_input (ifproto=0xffffff80b9fd2ce0, m=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2717 #21 0xffffff80091389cc in dlil_input_packet_list_common (ifp_param=0x0, m=0xffffff8a88c81c00, cnt=<value temporarily unavailable, due to optimizations>, mode=<value temporarily unavailable, due to optimizations>, ext=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2955 #22 0xffffff800913f4b9 in dlil_input_thread_func (v=0xffffff80ba99e200, w=-2000151552) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2823 (gdb)
And I also posted kernel stack trace for hang on this mailing list in January: http://prod.lists.apple.com/archives/darwin-kernel/2013/Jan/msg00007.html
If in case you are running into similar issue: Setting net.link.generic.system.flow_advisory to zero helped to get rid of hang. But it affects n/w performance significantly. Hence, we are still waiting for actual solution from apple on this.
See if this helps you. Please update once you receive any solution / workaround for the issue that you are experiencing here.
Thanks, Pratima
-----Original Message----- From: darwin-kernel-bounces+pratima.kudale=harmonicinc.com@lists.apple.com [mailto:darwin-kernel-bounces+pratima.kudale=harmonicinc.com@lists.apple.com] On Behalf Of Steven Bytnar Sent: Friday, April 19, 2013 1:22 PM To: Andreas Fink Cc: darwin-kernel@lists.apple.com Subject: Re: kernel lockup
Hi,
Instead of a full core dump, how about a summary of the core dump? This requires the kernel debug kit, but this used to be be a pretty good summary of what the machine was doing at the time of a panic. I used this with 10.5 to troubleshoot some third party software. It might need to be updated for 10.8.
$ cat pd.sh echo Start: date echo Working on $1 gdb -c $1 -x pd.gdb > $1.txt echo End: date $ cat pd.gdb add-symbol-file /Volumes/KernelDebugKit/mach_kernel source /Volumes/KernelDebugKit/kgmacros showallstacks showallthreads showalltasks showcurrentthreads showcurrentstacks showallvm zprint quit $ ./pd.sh {core-file-name}
--Steve
On Fri, Apr 19, 2013 at 10:02:39PM +0200, Andreas Fink wrote:
did that. [1]radar://13696346 Unfortunately the kernel coredump is too big to upload (several gigabytes). And now it dumps even after the reboot sometimes. On 18.04.2013, at 17:51, Shantonu Sen <[2]ssen@apple.com> wrote:
You can use FireWire KDP if the Ethernet interfaces stop working (see fwkdp(1) or the tech note on this) to attach to the kernel debugger and take a core dump. Depending on the exact issue, Ethernet may work for KDP even if the OS IP stack gets sad. The core dump should indicate the culprit, especially if you start with a proximal symptom such as a hanging process and trace the dependency change of resources or locks. Please file a Radar with the coredump Shantonu On Apr 18, 2013, at 7:20 AM, Andreas Fink <[3]afink@list.fink.org> wrote:
Hi Folks,
I'm running into some kernel related deadlocks here under 10.8.3 which I can not really figure out where to look further. We have the following setup:
XServe with two ethernets. en0 private IP's en1 public IPs.
on en1 we have several 100's of open tcp sessions at times and thats where all traffic comes in and gets processed (its SMPP protocol) The traffic is answered inside our application and processed and put into a MySQL database (which is connected over en0). a couple of hours later, the system "locks up". Now what really happens is the following:
a) you can no longer ping en1, nor does any sockets still work on it. b) you can still ping en0 c) on en0, established sessions still work, however opening a new ssh session for example doesn't work. d) typing commands in a still working session most of the time locks up the system. for example "killall myapp" doesn't do nothing and just stalls. e) syslog doesnt show anything spurious. f) my app is still in memory and runs fine g) "top" was showing little CPU load, plenty of free memory. All looks normal. h) netstat -m was not showing any dangerous buffer overflowing. i) an established remote desktop session gets killed j) The appplication doesn't crash, h) The kernel doesn't panic.
I was able to run a tcpdump on the interface while this was happening and what I see towards the end is that out of a sudden tcp retransmissions start to pile up. We see lots and lots of them out of the blue. In other words, the kernel seems to stop processing the packets somehow and doesn't acknowledge it to the remote anymore. Also incoming acknowledgments don't get processed. A few seconds later you can't do nothing with the machine anymore and you have to force reboot it over LOM (I praise Apple for implementing LOM into their XServers, even though it has its issues too).
It is obvious that the application/traffic somehow manages to saturate some kernel resource which makes that specific ethernet interface being locked up with a side effect on to the whole kernel (like not be able to load any binaries not in memory already).
I'm a bit lost to where look further to analyze this issue. Does anyone on this list might have a hint what could happen here?
_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list ([4]Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription:
[5]https://lists.apple.com/mailman/options/darwin-kernel/ssen%40apple. com
This email sent to [6]ssen@apple.com
Links: 1. file:///var/folders/Jw/JwJJw00g2Ra53k+1Ynt6pU+++TM/-Tmp-//radar:// 2. mailto:ssen@apple.com/ 3. mailto:afink@list.fink.org/ 4. mailto:Darwin-kernel@lists.apple.com/ 5. https://lists.apple.com/mailman/options/darwin-kernel/ssen%40apple.com 6. mailto:ssen@apple.com/
_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/darwin-kernel/smbapplelists%40 bytnar.net
This email sent to smbapplelists@bytnar.net
_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/darwin-kernel/pratima.kudale%40harmo...
This email sent to pratima.kudale@harmonicinc.com
_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.... This email sent to site_archiver@lists.apple.com
participants (1)
-
Andreas Fink