Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists
Re: kernel lockup
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: kernel lockup

Subject: Re: kernel lockup
From: Andreas Fink <email@hidden>
Date: Fri, 19 Apr 2013 22:34:09 +0200
On 19.04.2013, at 22:31, Pratima Kudale <email@hidden> wrote:

> Andreas,
>
> Are you seeing this issue only on 10.8.3? Or is it reproducible on any 10.8 version?

so far only on 10.8.3. I have not tried on older version.

>
> I am sharing my experience here. We are also running into n/w stack hang issue, I have  bug 13138492 open for it.

I'm running into this when it panics:
however most of the time it does simply freeze and lock up.

(gdb) backtrace
#0  Debugger (message=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/osfmk/i386/AT386/model_dep.c:916
#1  0xffffff800901d626 in panic (str=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/osfmk/kern/debug.c:336
#2  0xffffff800914e542 in sa_copy () at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:465
#3  0xffffff800915196a in rt_lookup (lookup_only=<value temporarily unavailable, due to optimizations>, dst=0xffffff80c949bcdc, netmask=<value temporarily unavailable, due to optimizations>, rnh=<value temporarily unavailable, due to optimizations>, ifscope=0) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:2681
#4  0xffffff800914e87c in rtalloc1_common_locked (dst=0xffffff80c949bcdc, report=1, ignflags=0, ifscope=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:876
#5  0xffffff800914e7c2 in rtalloc_ign_common_locked (ro=0xffffff80c949bcd0, ignore=0, ifscope=0) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:817
#6  0xffffff800914e635 in rtalloc_ign (ro=<value temporarily unavailable, due to optimizations>, ignore=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/route.c:829
#7  0xffffff7f8967cc10 in ?? ()
#8  0xffffff7f896804bb in ?? ()
#9  0xffffff7f8964ad4e in ?? ()
#10 0xffffff7f8963eed3 in ?? ()
#11 0xffffff7f8963b0e9 in ?? ()
#12 0xffffff7f8963927a in ?? ()
#13 0xffffff7f8963e147 in ?? ()
#14 0xffffff7f8963e1b0 in ?? ()
#15 0xffffff80091ca28f in ip_proto_dispatch_in (m=0xffffff8a88c81c00, hlen=<value temporarily unavailable, due to optimizations>, proto=<value temporarily unavailable, due to optimizations>, inject_ipfref=0xffffff800914eaaf) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:663
#16 0xffffff80091ca510 in ip_input (m=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:777
#17 0xffffff80091ca0ed in ip_proto_input (protocol=<value temporarily unavailable, due to optimizations>, packet_list=0x0) at /SourceCache/xnu/xnu-2050.22.13/bsd/netinet/ip_input.c:553
#18 0xffffff800915e26c in proto_input (protocol=<value temporarily unavailable, due to optimizations>, packet_list=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/kpi_protocol.c:290
#19 0xffffff80091405fb in ether_inet_input (ifp=<value temporarily unavailable, due to optimizations>, protocol_family=<value temporarily unavailable, due to optimizations>, m_list=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/ether_inet_pr_module.c:220
#20 0xffffff800913e7a1 in dlil_ifproto_input (ifproto=0xffffff80b9fd2ce0, m=0xffffff8a88c81c00) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2717
#21 0xffffff80091389cc in dlil_input_packet_list_common (ifp_param=0x0, m=0xffffff8a88c81c00, cnt=<value temporarily unavailable, due to optimizations>, mode=<value temporarily unavailable, due to optimizations>, ext=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2955
#22 0xffffff800913f4b9 in dlil_input_thread_func (v=0xffffff80ba99e200, w=-2000151552) at /SourceCache/xnu/xnu-2050.22.13/bsd/net/dlil.c:2823
(gdb)

>
> And I also posted kernel stack trace for hang on this mailing list in January:
> http://prod.lists.apple.com/archives/darwin-kernel/2013/Jan/msg00007.html
>
> If in case you are running into similar issue:  Setting net.link.generic.system.flow_advisory to zero helped to get rid of hang.
> But it affects n/w performance significantly.  Hence, we are still waiting for actual solution from apple on this.
>
> See if this helps you. Please update once you receive any solution / workaround for the issue that you are experiencing here.
>
> Thanks,
> Pratima
>
> -----Original Message-----
> From: darwin-kernel-bounces+pratima.kudale=email@hidden [mailto:darwin-kernel-bounces+pratima.kudale=email@hidden] On Behalf Of Steven Bytnar
> Sent: Friday, April 19, 2013 1:22 PM
> To: Andreas Fink
> Cc: email@hidden
> Subject: Re: kernel lockup
>
> Hi,
>
> Instead of a full core dump, how about a summary of the core dump?
> This requires the kernel debug kit, but this used to be be a pretty good summary of what the machine was doing at the time of a panic. I used this with 10.5 to troubleshoot some third party software. It might need to be updated for 10.8.
>
> $ cat pd.sh
> echo Start:
> date
> echo Working on $1
> gdb -c $1 -x pd.gdb > $1.txt
> echo End:
> date
> $ cat pd.gdb
> add-symbol-file /Volumes/KernelDebugKit/mach_kernel
> source /Volumes/KernelDebugKit/kgmacros
> showallstacks
> showallthreads
> showalltasks
> showcurrentthreads
> showcurrentstacks
> showallvm
> zprint
> quit
> $ ./pd.sh {core-file-name}
>
> --Steve
>
>
> On Fri, Apr 19, 2013 at 10:02:39PM +0200, Andreas Fink wrote:
>>   did that. [1]radar://13696346
>>   Unfortunately the kernel coredump is too big to upload (several
>>   gigabytes).
>>   And now it dumps even after the reboot sometimes.
>>   On 18.04.2013, at 17:51, Shantonu Sen <[2]email@hidden> wrote:
>>
>>     You can use FireWire KDP if the Ethernet interfaces stop working (see
>>     fwkdp(1) or the tech note on this) to attach to the kernel debugger and
>>     take a core dump. Depending on the exact issue, Ethernet may work for
>>     KDP even if the OS IP stack gets sad. The core dump should indicate the
>>     culprit, especially if you start with a proximal symptom such as a
>>     hanging process and trace the dependency change of resources or locks.
>>     Please file a Radar with the coredump
>>     Shantonu
>>     On Apr 18, 2013, at 7:20 AM, Andreas Fink <[3]email@hidden>
>>     wrote:
>>
>>       Hi Folks,
>>
>>       I'm running into some kernel related deadlocks here under 10.8.3 which
>>       I can not really figure out where to look further.
>>       We have the following setup:
>>
>>       XServe with two ethernets.
>>       en0   private IP's
>>       en1 public IPs.
>>
>>       on en1 we have several 100's of open tcp sessions at times and thats
>>       where all traffic comes in and gets processed (its SMPP protocol)
>>       The traffic is answered inside our application and processed and put
>>       into a MySQL database (which is connected over en0).
>>       a couple of hours later, the system "locks up". Now what really
>>       happens is the following:
>>
>>       a) you can no longer ping en1, nor does any sockets still work on it.
>>       b) you can still ping en0
>>       c) on en0, established sessions still work, however opening a new ssh
>>       session for example doesn't work.
>>       d) typing commands in a still working session most of the time locks
>>       up the system. for example "killall myapp" doesn't do nothing and just
>>       stalls.
>>       e) syslog doesnt show anything spurious.
>>       f) my app is still in memory and runs fine
>>       g) "top" was showing little CPU load, plenty of free memory. All looks
>>       normal.
>>       h) netstat -m was not showing any dangerous buffer overflowing.
>>       i) an established remote desktop session gets killed
>>       j) The appplication doesn't crash,
>>       h) The kernel doesn't panic.
>>
>>       I was able to run a tcpdump on the interface while this was happening
>>       and what I see towards the end is that out of a sudden tcp
>>       retransmissions start to pile up. We see lots and lots of them out of
>>       the blue.
>>       In other words, the kernel seems to stop processing the packets
>>       somehow and doesn't acknowledge it to the remote anymore. Also
>>       incoming acknowledgments don't get processed.
>>       A few seconds later you can't do nothing with the machine anymore and
>>       you have to force reboot it over LOM (I praise Apple for implementing
>>       LOM into their XServers, even though it has its issues too).
>>
>>       It is obvious that the application/traffic somehow manages to saturate
>>       some kernel resource which makes that specific ethernet interface
>>       being locked up with a side effect on to the whole kernel (like not be
>>       able to load any binaries not in memory already).
>>
>>       I'm a bit lost to where look further to analyze this issue.
>>       Does anyone on this list might have a hint what could happen here?
>>
>>       _______________________________________________
>>       Do not post admin requests to the list. They will be ignored.
>>       Darwin-kernel mailing list      ([4]email@hidden)
>>       Help/Unsubscribe/Update your Subscription:
>>
>> com
>>
>>       This email sent to [6]email@hidden
>>
>> Links:
>> 1. file:///var/folders/Jw/JwJJw00g2Ra53k+1Ynt6pU+++TM/-Tmp-//radar://
>> 2. mailto:email@hidden/
>> 3. mailto:email@hidden/
>> 4. mailto:email@hidden/
>> 5.
>> 6. mailto:email@hidden/
>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Darwin-kernel mailing list      (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>> bytnar.net
>>
>> This email sent to email@hidden
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Darwin-kernel mailing list      (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden



 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden
References:
	>kernel lockup (From: Andreas Fink <email@hidden>)
	>Re: kernel lockup (From: Shantonu Sen <email@hidden>)
	>Re: kernel lockup (From: Andreas Fink <email@hidden>)
	>Re: kernel lockup (From: Steven Bytnar <email@hidden>)
	>RE: kernel lockup (From: Pratima Kudale <email@hidden>)
Prev by Date: RE: kernel lockup
Next by Date: RE: kernel lockup
Previous by thread: RE: kernel lockup
Next by thread: Re: kernel lockup
Index(es):
- Date
- Thread