Re: kernel lockup
Re: kernel lockup
- Subject: Re: kernel lockup
- From: Andreas Fink <email@hidden>
- Date: Tue, 30 Apr 2013 01:16:23 +0200
On 19.04.2013, at 22:31, Pratima Kudale <email@hidden> wrote:
> Andreas,
>
> Are you seeing this issue only on 10.8.3? Or is it reproducible on any 10.8 version?
We see this in 10.8.3. We have not tried it in any other versions. But it could very well be only since 10.8.3 is installed as we did upgrade not too long ago from 10.8.2 to 10.8.3 and we didn't experience such issues.
> I am sharing my experience here. We are also running into n/w stack hang issue, I have bug 13138492 open for it.
>
> And I also posted kernel stack trace for hang on this mailing list in January:
> http://prod.lists.apple.com/archives/darwin-kernel/2013/Jan/msg00007.html
>
> If in case you are running into similar issue: Setting net.link.generic.system.flow_advisory to zero helped to get rid of hang.
> But it affects n/w performance significantly. Hence, we are still waiting for actual solution from apple on this.
>
> See if this helps you. Please update once you receive any solution / workaround for the issue that you are experiencing here.
>
> Thanks,
> Pratima
I will try that. We have a freeze every couple of days due to that. We use SCTP extensively which of course uses s sockets. We use to have a couple of thousand TCP sockets open and dozens of SCTP sockets open in normal operation. The crash we where seeing a couple of days back was just due to a wrongly built SCTP driver when updating (it was crashing due to the build having an ifdef not set which it should have had). That crash was quickly fixed but the freeze remained. debugging is hard as the machine is remote. So pressing the power button/NMI to get into debugger means going there.
>
> -----Original Message-----
> From: darwin-kernel-bounces+pratima.kudale=email@hidden [mailto:darwin-kernel-bounces+pratima.kudale=email@hidden] On Behalf Of Steven Bytnar
> Sent: Friday, April 19, 2013 1:22 PM
> To: Andreas Fink
> Cc: email@hidden
> Subject: Re: kernel lockup
>
> Hi,
>
> Instead of a full core dump, how about a summary of the core dump?
> This requires the kernel debug kit, but this used to be be a pretty good summary of what the machine was doing at the time of a panic. I used this with 10.5 to troubleshoot some third party software. It might need to be updated for 10.8.
>
> $ cat pd.sh
> echo Start:
> date
> echo Working on $1
> gdb -c $1 -x pd.gdb > $1.txt
> echo End:
> date
> $ cat pd.gdb
> add-symbol-file /Volumes/KernelDebugKit/mach_kernel
> source /Volumes/KernelDebugKit/kgmacros
> showallstacks
> showallthreads
> showalltasks
> showcurrentthreads
> showcurrentstacks
> showallvm
> zprint
> quit
> $ ./pd.sh {core-file-name}
>
> --Steve
>
>
> On Fri, Apr 19, 2013 at 10:02:39PM +0200, Andreas Fink wrote:
>> did that. [1]radar://13696346
>> Unfortunately the kernel coredump is too big to upload (several
>> gigabytes).
>> And now it dumps even after the reboot sometimes.
>> On 18.04.2013, at 17:51, Shantonu Sen <[2]email@hidden> wrote:
>>
>> You can use FireWire KDP if the Ethernet interfaces stop working (see
>> fwkdp(1) or the tech note on this) to attach to the kernel debugger and
>> take a core dump. Depending on the exact issue, Ethernet may work for
>> KDP even if the OS IP stack gets sad. The core dump should indicate the
>> culprit, especially if you start with a proximal symptom such as a
>> hanging process and trace the dependency change of resources or locks.
>> Please file a Radar with the coredump
>> Shantonu
>> On Apr 18, 2013, at 7:20 AM, Andreas Fink <[3]email@hidden>
>> wrote:
>>
>> Hi Folks,
>>
>> I'm running into some kernel related deadlocks here under 10.8.3 which
>> I can not really figure out where to look further.
>> We have the following setup:
>>
>> XServe with two ethernets.
>> en0 private IP's
>> en1 public IPs.
>>
>> on en1 we have several 100's of open tcp sessions at times and thats
>> where all traffic comes in and gets processed (its SMPP protocol)
>> The traffic is answered inside our application and processed and put
>> into a MySQL database (which is connected over en0).
>> a couple of hours later, the system "locks up". Now what really
>> happens is the following:
>>
>> a) you can no longer ping en1, nor does any sockets still work on it.
>> b) you can still ping en0
>> c) on en0, established sessions still work, however opening a new ssh
>> session for example doesn't work.
>> d) typing commands in a still working session most of the time locks
>> up the system. for example "killall myapp" doesn't do nothing and just
>> stalls.
>> e) syslog doesnt show anything spurious.
>> f) my app is still in memory and runs fine
>> g) "top" was showing little CPU load, plenty of free memory. All looks
>> normal.
>> h) netstat -m was not showing any dangerous buffer overflowing.
>> i) an established remote desktop session gets killed
>> j) The appplication doesn't crash,
>> h) The kernel doesn't panic.
>>
>> I was able to run a tcpdump on the interface while this was happening
>> and what I see towards the end is that out of a sudden tcp
>> retransmissions start to pile up. We see lots and lots of them out of
>> the blue.
>> In other words, the kernel seems to stop processing the packets
>> somehow and doesn't acknowledge it to the remote anymore. Also
>> incoming acknowledgments don't get processed.
>> A few seconds later you can't do nothing with the machine anymore and
>> you have to force reboot it over LOM (I praise Apple for implementing
>> LOM into their XServers, even though it has its issues too).
>>
>> It is obvious that the application/traffic somehow manages to saturate
>> some kernel resource which makes that specific ethernet interface
>> being locked up with a side effect on to the whole kernel (like not be
>> able to load any binaries not in memory already).
>>
>> I'm a bit lost to where look further to analyze this issue.
>> Does anyone on this list might have a hint what could happen here?
>>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Darwin-kernel mailing list ([4]email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>>
>> com
>>
>> This email sent to [6]email@hidden
>>
>> Links:
>> 1. file:///var/folders/Jw/JwJJw00g2Ra53k+1Ynt6pU+++TM/-Tmp-//radar://
>> 2. mailto:email@hidden/
>> 3. mailto:email@hidden/
>> 4. mailto:email@hidden/
>> 5.
>> 6. mailto:email@hidden/
>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Darwin-kernel mailing list (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>> bytnar.net
>>
>> This email sent to email@hidden
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Darwin-kernel mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden