Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xgrid] Agents appearing Offline (follow-up #2)



Charles,

This matches my experience. I (very stupidly) deleted the original
controller database, so I can't re-create your scenario exactly.
Unfortunately, however, restoring the entire /var/xgrid/* tree from
our daily backup of the day before our power-outage (after stopping
the Xgrid service) does not seem to fix the Offline agent problem
after restarting the Xgrid service. I have, however, confirmed your
test results on a non-server OS X box.

Thanks for clarifying the situation so well in your description.

Do you think it's possible that there was an upgrade of the database
libraries that Xgrid uses (Berkeley DB, right?) that's incompatible
with the Xgrid binaries in OS X 10.4.4?

Barry



On 2/15/06, Charles Parnot <email@hidden> wrote:
> > Other tid-bits that have come to mind that the detectives out there
> > might find relevant:
> > -After the power outage that started this issue, the server came back
> > online automatically (it's set to auto restart after a power failure).
> > At that point, the agents re-registered correctly, and were Online.
> > Jobs submitted to the controller, however, failed with an error =
> > "task: unexpected reply". Having had a similar problem before, I
> > stopped the Xgrid service, deleted the controller database and then
> > restarted the service. From then on, only Offline agents.
> > -All of the agents experienced the same power-outage that started the
> > problems on the controller.
> >
> > Short of a full restore from backup prior to the start of our
> > problems, does anyone have other suggestions?
> >
> > - Barry "desperately seeking suggestions" Wark
>
> I investigated a little more the issue, as I have a similar problem
> right now. I had indications from my controller Xgrid@Stanford that
> something fishy was going on that was specific for a naive system
> (naive = never ran xgridcontrollerd) upgraded to X.4.4. I tested this
> hypothesis on my test machine running Tiger Client.
>
> Here is the description of what I was able to do and the problem I
> think I identified:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford/problem.html
>
> The test described there was performed actually twice, and I got the
> same issue twice. The little problem mentioned when going back to
> normal with getting the agent back only happened the second time, and
> I did not have the energy to redo everything again, just for that.
>
> Also, this is consistent with what I just saw with my Xgrid@stanford,
> where I reinstalled the system from scracth, erasing the hard drive.
> It seemed to work on X.4.0 on a fresh naive system and then all
> agents were offline on X.4.4 after cleaning up the xgrid database. I
> need to retry more of that.
>
> charles
>
>
>
> FYI, I will just paste here the "Conclusions". Ernie and Barry, I
> send you a full version of the test in plain text, which might not
> make it on lists.apple because of the size.
>
>
> My interpretation is that when xgridcontrollerd runs the first time,
> only X.4.1 (and maybe later versions too) is able to create a valid
> database that can register agents and recognize them as "Available",
> but X.4.4 (and maybe earlier versions too) creates a database somehow
> corrupted or invalid that considers all connected agents "Offline".
> This is consistent with the issue I had with my real controller
> running Xgrid@Stanford.
>
> To fully test that, one should test a fresh naive installation of X.
> 4.0, and then a fresh naive installation of X.4.4. I will actually do
> that soon.
>
>
> --
> Xgrid-at-Stanford
> Help science move fast forward:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford
>
> Charles Parnot
> email@hidden
>
>
>
>
>  _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Xgrid-users mailing list      (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/xgrid-users/email@hidden
>
> This email sent to email@hidden
>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Xgrid-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xgrid-users/email@hidden

This email sent to email@hidden

References: 
 >[Xgrid] Agents appearing Offline (follow-up #2) (From: Barry Wark <email@hidden>)
 >Re: [Xgrid] Agents appearing Offline (follow-up #2) (From: Charles Parnot <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.