which sort of application bugs hang wotaskd?
which sort of application bugs hang wotaskd?
- Subject: which sort of application bugs hang wotaskd?
- From: OC <email@hidden>
- Date: Mon, 24 Oct 2016 17:41:29 +0200
Hello there,
there seems to be one pretty rare, ugly and hard-to find lock in my application (I shall get back to it at the end, in hope it might ring a bell), but what's most weird: it seems that when it happens, it's _wotaskd_ what primarily goes down?!?
Alas, the information is sparse: it is the deployment site, to where the programming team has no access (and so far we were not able to repeat the problem at the test site whatever we try), but due to the site admin and logs, it looks like
(a) first, one of the worker threads hangs somehow, so far inexplicably (EC locking problem possible but improbable, explained below)
(b) for some time, other threads run without a glitch, new reqeusts are served, new R/R loop worker threads are spawned and logged (I log out all R/R loops)
(c) shortly (in minutes) though the adaptor begins to redirect requests to the “Redirection URL”
(d) now, the site admin is alerted; he runs JavaMonitor **which reports “Failed to contact 127.0.0.1-1085”**!
(e) he finds which process belongs to *the application instance* (*not* the wotaskd!), and kills it from Terminal
(f) which causes wotaskd to magically cure and JavaMonitor starts working and stops showing the 1085 fail, allows to re-launch the instance, all is well and swell.
Does this perhaps ring a bell? To me this behaviour does not make any sense :/
As for the hang itself, it's rather weird too. There is a loop which goes through a list of EOs; each of them is logged out. Something like this:
===
for (DBTimeChunk tch in session().currentMarket.orderedTimeChunks()) {
log.info(""+tch)
if (tch.someTimestamp>fixedTimestamp) continue // happens to be true in our case
... therefore some irrelevant code here (it would log if it happened, does not) ...
}
===
The problem is that
- this goes through some of the TimeChunks, and _then_ it hangs -- not at the start of R/R loop, where EC locking problems could be expected
- in the same session, with the same EC, even in the same thread (for the method which contains the loop happens to be used twice in the page template) the loop already run through all the TimeChunks and tested their someTimestamp and ended without a glitch (so, no fault is fired when it hangs)
So far it happened about thrice; each time on different TimeChunk.
About the only thing I guess _might_ cause the hang of the thread is the "log tch". TimeChunk's toString() is comparatively complex, it might call, among more mundane things, also
- this.changesFromCommittedSnapshot()
- this.attributeKeys()
- this.primaryKey() (of ERXGenericRecord which it inherits)
Might one of them hang the thread, if another thread does the same/something other at the wrong moment? (Presumed all of them were already called for the same EO in the same thread all right shortly ago.)
If it happens again, it would help if the site admin could, before killing the application, to force it somehow to log the stacktracks of all its threads. Is there some trick for that?
And of course, for any other advice how to hunt for this bloody kind of bug I'll be extremely grateful.
Thanks a lot,
OC
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden