Here, we've got 5 xserves, and sometimes one (not always the same) come into
a bizare state. It just sits there, all running services (web, java
applications, etc) continue to run as normally, they continue to respond and
process requests. BUT, no new connection can be done via ftp, ssh, or
anything else. No new process can be created (crons, etc). And if I'm
connected through ssh, I get this for every command I type :
Cannot allocate memory
At first I tought it was due to a security update that had not been aplied
yet to all servers (2006-004 containing a fix for openssh that yeilds to
this kind of symptomns). Other clues lead me to concludes that it was ssh
related (other linux machines having dictonary attacks at the same time).
But it's not that. Last mondy, I blocked completly ssh at the firewall, and
only opened it for specific ips (ours), and one of the server did expericend
this problem a few minutes ago.
It's not a process eating up all the memory. I can say this for sure, since
the last time the one that crashed like this was a developpement one doing
almost nothing.
All xserves are g5 cluster nore dual 2 GHz (except one that is 2.3 GHz). All
have between 1.5 and 6 gig of ram. All of them are connected to a shared
volume via XSan.
In the minutes before the crash, there is nothing noticable in the log
files.