Re: Darwin-dev Digest, Vol 3, Issue 277
Re: Darwin-dev Digest, Vol 3, Issue 277
- Subject: Re: Darwin-dev Digest, Vol 3, Issue 277
- From: Michael Smith <email@hidden>
- Date: Tue, 24 Oct 2006 12:39:12 -0700
On Oct 24, 2006, at 12:04 PM, Al Kirkus <email@hidden> wrote:
I run about 10 OSX file servers (AFP) under 10.4.7. Occassionally I
see one of the 2 AppleFileServer processes go into an uninterruptible
wait state and stop servicing all requests - functionally
disconnecting all of its clients. My attempts to kill the process
result in it going into a exiting state but the process never
actually exits (the longest I waited was 2 days). I have to pull the
plug to get the servers back to normal. This happens intermittently
(anywhere between 1 and 30+ days apart) on all of my servers that
service a fairly heavy load.
I am wondering what I could do at a darwin level to try to trace the
process and see why it is hung and won't exit. Maybe then I can
formulate some type of solution/workaround.
Firstly, you should ensure that you have filed a bug against this
issue with Apple, either via DTS or bugreporter.apple.com depending
on your developer status.
However, to the meat of your question. First, the userspace approach:
When the afpserver process is hung, run 'sample' against it and look
at the call graphs. There will almost certainly be at least one
thread in a system call; this may give you a clue.
Secondly, the kernel approach:
Read the technote on two-machine debugging, and obtain the debug kit
for the build you are running on your servers. When you experience a
process hang such as you describe, NMI the machine in question and
attach to it with your debug system.
Source the kgmacros file in gdb once you're attached and issue the
'showallstacks' command. In the (copious) output, search for threads
belonging to the afpserver process, and see if you can work out what
subsystem they're stuck in. You may have to generate symbols for
kernel extensions; there should be instructions on doing this in the
I/O Kit debugging documentation, or as a last resort bug the folks on
this list.
In the case of the thread stuck in a system call, this will let you
track the problem closer to its source. Once you have more
incriminating evidence, add it to your bug (or file a new one
referencing the old one).
= Mike
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden