My remote Mac mini which runs as a Web, NS and Mail server (running
10.4.11 Server) keeps freezing for a few weeks now regularly. Hardware
defects have been ruled out, it appears to be a software problem.
The symptoms of the freeze are as follows:
- any new process invocation fails.
- network stack still works (responds to pings, and keeps transmitting
data for previously opened ssh sessions and other existing TCP
sessions).
- mouse still works, the UI as well (i.e. I can click on UI items and
they react).
- the system.log file stops recording anything.
- nothing indicates the imminent freeze.
- it started suddenly one day and reappears every 2-3 days (mostly
around 8-11pm) since.
- no apparent hard disk problems (no SMART problems ever reported, no
funny sounds, etc.).
Hence, when it freezes, I can move the mouse and still type in a
shell, but once I try to launch any tool that should give me more
information (e.g. terminal commands such as "ps", "tail system.log" or
"lsof"), they won't execute, and even ctrl-c won't bring the shell
back to be responsive again.
I suspect some internal deadlock. Problem is that even the syslog file
won't show anything that helps explaining it. This all makes it very
hard to me to find the cause for the freezes.
But maybe some part of the system actually detects the problem and
wants to report it using syslog or ktrace, only can't get it thru
because some lower system part is already dead that is required for
this logging.
So here's my question to this list:
Could I re-route syslog (and maybe ktrace, in case that may help)
ouput to a running terminal so that no file system is used? or could
it route it to the network via UDP or via an _open_ TCP connection so
that I can see the possible output?
Due to the fact that even the file system seems to become frozen
(because of system.log not showing anything after the freeze-up), I am
looking for ways that do not use the file system to route the syslog
kernel calls, but output right away to a memory-based stream (socket?)
or whatever the kernel offers.
Or any other ideas of how to debug this? Part of the problem is that
the server is in a remote place where I can't sit around waiting for
it to freeze. All I can do is power-cycle it remotely.
Also, would there be a tool that I could keep running that monitors
the hard disk state, so that I could verify that the HD is still
working and responsive after the freeze, to rule out a disk lockup? It
must be a tool that doesn't need disk access itself to issue this
test, of course (hence I need to start it before the freeze and keep
it running until I ask it to report the current state).
This is one of the times where I wish Macs still had RS-232 I/O for
easy debugging of these kind of things...
--
Thomas Tempelmann, http://www.tempel.org/