We've been having persistent uptime problems with one of our
servers. Last uptime was 123 days.
I think automount crashing caused it:
Jan 28 10:27:05 academic crashdump[9768]: automount crashedJan 28
10:27:05 academic crashdump[9768]: crash report written to: /Library/
Logs/CrashReporter/automount.crash.log
Jan 28 10:27:38 academic kernel[0]: nfs server automount -nsl [186]:
not respondingJan 28 10:27:38 academic KernelEventAgent[35]: tid
00000000 received VQ_NOTRESP event (1)
Jan 28 10:27:38 academic KernelEventAgent[35]: tid 00000000 type
'nfs', mounted on '/Network', from 'automount -nsl [186]', not
responding
Jan 28 10:27:38 academic KernelEventAgent[35]: tid 00000000 found 1
filesystem(s) with problem(s)
Here's the info on the server:
It's a dual 2.3GHz G5 Xserve with 2GB of ram. It boots off of a 80Gb
SW raid one, and storage is a 1.5GB raid5 on an xserve raid. It
generally has mounted 2 fw drives for boot backups as well as an NFS*
mount for storage backup.
When it crashes, the first thing that we normally hear is that
students are no longer able to log in because AFP is dead. Today,
the first thing that was reported was that user web shares were
broken. Apache reported that it did not have search access along its
path. (which, it did)
It was running 10.4.6. I took this opportunity to update to 10.4.8
There are 1157 users with home directories on the box. Normal use
has about 50. Peak usage tends to be around 150. sendmail is run on
this machine, but only to send error messages, user mail is handled
on a different box.
Stock apache is also run, but 12R/s is about the heaviest load that
it ever sees.
Automount crashed with the header of:
**********
Host Name: academic
Date/Time: 2007-01-28 10:27:04.824 -0600
OS Version: 10.4.6 (Build 8I127)
Report Version: 4
Command: automount
Path: /usr/sbin/automount
Parent: launchd [1]
Version: ??? (???)
PID: 186
Thread: 5
Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000
***********
Which, to my untrained eye looks like a function that should have
returned an address returned an error code that wasn't ever checked.
console had some errors that I should probably look into, but are
likely a side effect:
2007-01-28 14:25:13.745 SyndicationAgent[4924] WARNING:
BestCalendarDateFromString - can't interpret: 'Sun 28 Jan 2007
11:42:46 -800
AFP shows no errors since September
Anyone have any ideas as to how we can make this box more stable? I
do not like that core daemons will just crash without warning. For
one thing, they make me come to work on Sunday.
*Please, no flames. It is on a private VLAN. I may be stupid, but
I'm not that stupid. Take your holy wars elsewhere.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Macos-x-server mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/macos-x-server/jbudde%
40umich.edu
This email sent to email@hidden