Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xgrid] Agents appearing Offline (follow-up #2)



Other tid-bits that have come to mind that the detectives out there
might find relevant:
-After the power outage that started this issue, the server came back
online automatically (it's set to auto restart after a power failure).
At that point, the agents re-registered correctly, and were Online.
Jobs submitted to the controller, however, failed with an error =
"task: unexpected reply". Having had a similar problem before, I
stopped the Xgrid service, deleted the controller database and then
restarted the service. From then on, only Offline agents.
-All of the agents experienced the same power-outage that started the
problems on the controller.

Short of a full restore from backup prior to the start of our
problems, does anyone have other suggestions?

- Barry "desperately seeking suggestions" Wark

I investigated a little more the issue, as I have a similar problem right now. I had indications from my controller Xgrid@Stanford that something fishy was going on that was specific for a naive system (naive = never ran xgridcontrollerd) upgraded to X.4.4. I tested this hypothesis on my test machine running Tiger Client.


Here is the description of what I was able to do and the problem I think I identified:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/problem.html


The test described there was performed actually twice, and I got the same issue twice. The little problem mentioned when going back to normal with getting the agent back only happened the second time, and I did not have the energy to redo everything again, just for that.

Also, this is consistent with what I just saw with my Xgrid@stanford, where I reinstalled the system from scracth, erasing the hard drive. It seemed to work on X.4.0 on a fresh naive system and then all agents were offline on X.4.4 after cleaning up the xgrid database. I need to retry more of that.

charles



FYI, I will just paste here the "Conclusions". Ernie and Barry, I send you a full version of the test in plain text, which might not make it on lists.apple because of the size.


My interpretation is that when xgridcontrollerd runs the first time, only X.4.1 (and maybe later versions too) is able to create a valid database that can register agents and recognize them as "Available", but X.4.4 (and maybe earlier versions too) creates a database somehow corrupted or invalid that considers all connected agents "Offline". This is consistent with the issue I had with my real controller running Xgrid@Stanford.


To fully test that, one should test a fresh naive installation of X. 4.0, and then a fresh naive installation of X.4.4. I will actually do that soon.


-- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford

Charles Parnot
email@hidden




_______________________________________________ Do not post admin requests to the list. They will be ignored. Xgrid-users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/xgrid-users/email@hidden

This email sent to email@hidden
References: 
 >[Xgrid] Agents appearing Offline (follow-up #2) (From: Barry Wark <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.