Re: JM does not see wotaskd-after some time
Re: JM does not see wotaskd-after some time
- Subject: Re: JM does not see wotaskd-after some time
- From: Chuck Hill <email@hidden>
- Date: Fri, 5 Mar 2010 10:22:02 -0800
Hi Ondra,
On Mar 5, 2010, at 7:07 AM, Ondřej Čada wrote:
Hello there,
I've got a weird problem. We have a pretty plain WO installation in
a 10.6 Server: installed through WOInstaller.jar, replaced Apache
adaptor by the 64-bit one from Wonder, added the launchd plist for
wotaskd, updated the Apache config, yadda yadda. Installed four-odd
applications, most one instance, one of them two instances.
Thing is: for about a day all works perfectly.
Then, JavaMonitor stops seeing wotaskd ("Failed to contact
localhost-1085").
Simply put: that message lies. What it really means is "wotaskd timed
out trying to communicate with one of the instances"
At about the same time (and quite probably for the same reason) the
two-instance application starts behaving a bit weird; sometimes, I
can't log in at all, sometimes, one instance never gets a request,
all of them are directed to the other -- even if I try the "server/
cgi-bin/WO/app/1" URL, I get re-directed to ".../2". Indeed, as seen
in the instance logs, the instance 2 does all the work (instance 1
does run though -- there is a couple of WOTimer-launched internal
actions there, and they tick all right all the time).
That sounds like one (or sometimes both) of the instances are stalled
or deadlocked. That is exactly what produces the "Failed to contact
localhost-1085" message. Do you have concurrent request dispatch
enabled? Does anything take a long time to process? Try using jstack
to get a tread dump from them when this happens.
Now for the really weird thing: wotaskd, does run and is accessible.
If I switch JM to the Hosts page (where one host, "localhost", is
configured), _IT REPORTS AVAILABLE: YES_! (And clicking YES I do get
the configuration all right in a new window.) Yet, switching back to
Applications and clicking "Detail View", I get again "Failed to
contact localhost-1085". Can be repeated. I'll be damned :-O
It is not trying to talk to the instances on the Hosts page, so you
don't get that error.
All the logs look OK, about the only thing which seems related is
that the wotaskd log contains a few items of kind
[2010-03-05 15:02:58 CET] <WorkerThread10> <WOWorkerThread id=10
socket=Socket[addr=/127.0.0.1,port=50809,localport=1085]> Exception
while sending response: java.net.SocketException: Broken pipe
Nevertheless, there's not many of them -- definitely such a report
does NOT occur anytime JavaMonitor tries to connect to wotaskd and
fails; the log occurs only occassionally.
You can ignore that.
The one cure I've found so far is ugly: to reboot the server. In
that case, all runs perfectly -- for about a day again, when the
problems are back. Note: they seem to pop up rather in a fixed time,
than being based on uptime; this weakly hints the problem might be
rather related to some timed task on the server which fouls
something, than to a buffer/cache/whatever overload -- although the
latter is definitely possible, too.
Grep the logs for OutOfMemory, it could be that.
I'd be pretty glad for any hint; at this moment I do not really know
what to do :(
I think you have it. :-) Your application has a problem. That
problem is annoying wotaskd.
Chuck
--
Chuck Hill Senior Consultant / VP Development
Practical WebObjects - for developers who want to increase their
overall knowledge of WebObjects or who are trying to solve specific
problems.
http://www.global-village.net/products/practical_webobjects
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden