For those who are following the offline agent soap opera, here's the
latest update...
At Ernie P.'s suggestions, I tried deleting both the controller and
agent's files under /var/xgrid. Here's what I did:
1. Stop the Xgrid service in the Server Admin app
2. Delete /var/xgrid/agent/* and /var/xgrid/controller/* (deleting
/var/xgrid or /var/xgrid/* caused errors when restarting the Xgrid
service)
3. Restart the Xgrid service in the Server Admin app
(I didn't delete /var/xgrid/agent/* on any of the other agents) As
expected, the controller's database and the agents' krb5.tab and tasks
folder were created in the expected locations. The Xgrid service
started without error, according to system.log. The server's own agent
and our other agents connected successfully and (again, according to
the log) were registered on the default grid, but with state=Offline.
So, still no luck. The agents connect but are still all Offline.
Other tid-bits that have come to mind that the detectives out there
might find relevant:
-After the power outage that started this issue, the server came back
online automatically (it's set to auto restart after a power failure).
At that point, the agents re-registered correctly, and were Online.
Jobs submitted to the controller, however, failed with an error =
"task: unexpected reply". Having had a similar problem before, I
stopped the Xgrid service, deleted the controller database and then
restarted the service. From then on, only Offline agents.
-All of the agents experienced the same power-outage that started the
problems on the controller.
Short of a full restore from backup prior to the start of our
problems, does anyone have other suggestions?
- Barry "desperately seeking suggestions" Wark
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xgrid-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xgrid-users/email@hidden
This email sent to email@hidden