I've got a Panther (10.3.9) Server file server (Xserve G5 DP) that
seems to hang up its authentication and AFP services, and I'm having
a hell of a time figuring out why. Of course, it's getting more and
more frequent. I've poked around in a bunch of log files, including
system.log, AppleFileServiceAccess.log and AppleFileServiceError.log,
but nothing telling jumps out at me. There are plenty of after the
fact symptoms in the logs like (from system.log):
Sep 1 10:29:17 file /usr/share/servermgrd/cgi-bin/servermgr_smb:
[11564-a000b2a4] SIGALRM: process timed out
Sep 1 10:34:13 file /usr/share/servermgrd/cgi-bin/servermgr_info:
[11620-a000b2a4] SIGALRM: process timed out
Sep 1 10:35:46 file /usr/share/servermgrd/cgi-bin/servermgr_print:
[11633-a000b2a4] SIGALRM: process timed out
Sep 1 10:37:16 file /usr/share/servermgrd/cgi-bin/servermgr_smb:
[11647-a000b2a4] SIGALRM: process timed out
Sep 1 10:39:13 file /usr/share/servermgrd/cgi-bin/servermgr_info:
[11672-a000b2a4] SIGALRM: process timed out
Sep 1 10:40:46 file /usr/share/servermgrd/cgi-bin/servermgr_print:
[11687-a000b2a4] SIGALRM: process timed out
and this (from AppleFileServiceAccess.log, I put the S, X, Y, Z in
the IPs):
IP 128.175.S.X - - [01/Sep/2005:10:34:39 -0500] "Client no response
timeout: hsiao" 0 0 0
IP 128.175.S.X - - [01/Sep/2005:10:34:39 -0500] "Saved for Reconnect
User: hsiao" 1125500866 46 0
IP 128.175.S.X - - [01/Sep/2005:10:34:39 -0500] "Client no response
timeout: <Guest>" 0 0 0
IP 128.175.S.X - - [01/Sep/2005:10:34:39 -0500] "Saved for Reconnect
User: <Guest>" 1125500866 49 0
IP 128.175.S.Y - - [01/Sep/2005:10:34:39 -0500] "Client no response
timeout: irwin" 0 0 0
IP 128.175.S.Y - - [01/Sep/2005:10:34:39 -0500] "Saved for Reconnect
User: irwin" 1125500866 52 0
IP 128.175.S.Y - - [01/Sep/2005:10:34:39 -0500] "Client no response
timeout: <Guest>" 0 0 0
IP 128.175.S.Y - - [01/Sep/2005:10:34:39 -0500] "Saved for Reconnect
User: <Guest>" 1125500866 55 0
IP 128.175.S.Z - - [01/Sep/2005:10:34:39 -0500] "Client no response
timeout: bacuta" 0 0 0
IP 128.175.S.Z - - [01/Sep/2005:10:34:39 -0500] "Saved for Reconnect
User: bacuta" 1125500866 94 0
. There's nothing the CrashReporter logs that anywhere near coincide
with the times of the hangs.
Here's some more details about the server. It's an OD master. It
fileserves AFP, NFS, and SMB. My NFS clients mount shares
statically, no automounting. The AFP clients use automounting, and
the mount records are in the OD. The home directory share (/Volumes/
Homes) is shared all three ways so that all the types of clients I
have can get to their home directories. It's an SMB PDC. DNS is
handled by the campus, and it works in reverse and forward. The
server has one static IP address, and a name is associated with it.
I'm using SSL with the LDAP directory for my Linux clients with a
purchased certificate. All the users are type OD. So far, as usual,
the hangs occur during the middle of the day when usage is probably
heaviest, but there's no sign of failure to keep up in the logs, and
the load average is always low (well under .5).
I've called Apple support, and, at their suggestion, I've already
reindexed the LDAP directory using some combinations of 'slapconfig'
and 'slapindex' that I don't remember. I've also reinitialized my
PasswordServer database using some NeST commands, which made me very
popular with my users. :-(
Today, Apple support pointed me toward knowledge base article number
107899, which talks about a situation where authentication can hang
due to lookupd maxing out its threads. I've been running 'top -l0 |
grep lookupd' for hours now, and the thread count for lookupd hasn't
gone past 6, so I'm reluctant to believe that this is my issue.
If anyone has been in a similar boat and has some suggestions for
clues to look for that would be most appreciated. Thanks a bunch in
advance for your help.
--
- Peter Schwenk
- CITA-3, Systems Administrator
- Mathematical Sciences
- University of Delaware
- 437 Ewing Hall, Newark, Delaware 19716-2553 USA
- (302) 831-0437 (v); (302) 831-4511 (f)
- schwenk _at_ math _dot_ udel _dot_ edu