Re: Darwin-dev Digest, Vol 3, Issue 277
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com On Oct 24, 2006, at 12:04 PM, Al Kirkus <al@kirkus.org> wrote: I run about 10 OSX file servers (AFP) under 10.4.7. Occassionally I see one of the 2 AppleFileServer processes go into an uninterruptible wait state and stop servicing all requests - functionally disconnecting all of its clients. My attempts to kill the process result in it going into a exiting state but the process never actually exits (the longest I waited was 2 days). I have to pull the plug to get the servers back to normal. This happens intermittently (anywhere between 1 and 30+ days apart) on all of my servers that service a fairly heavy load. I am wondering what I could do at a darwin level to try to trace the process and see why it is hung and won't exit. Maybe then I can formulate some type of solution/workaround. However, to the meat of your question. First, the userspace approach: Secondly, the kernel approach: = Mike _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... Firstly, you should ensure that you have filed a bug against this issue with Apple, either via DTS or bugreporter.apple.com depending on your developer status. When the afpserver process is hung, run 'sample' against it and look at the call graphs. There will almost certainly be at least one thread in a system call; this may give you a clue. Read the technote on two-machine debugging, and obtain the debug kit for the build you are running on your servers. When you experience a process hang such as you describe, NMI the machine in question and attach to it with your debug system. Source the kgmacros file in gdb once you're attached and issue the 'showallstacks' command. In the (copious) output, search for threads belonging to the afpserver process, and see if you can work out what subsystem they're stuck in. You may have to generate symbols for kernel extensions; there should be instructions on doing this in the I/O Kit debugging documentation, or as a last resort bug the folks on this list. In the case of the thread stuck in a system call, this will let you track the problem closer to its source. Once you have more incriminating evidence, add it to your bug (or file a new one referencing the old one). This email sent to site_archiver@lists.apple.com
participants (1)
-
Michael Smith