Re: WOWorkerThread deadlocks
Re: WOWorkerThread deadlocks
- Subject: Re: WOWorkerThread deadlocks
- From: Maik Musall <email@hidden>
- Date: Mon, 10 Sep 2012 22:07:09 +0200
Hi Chuck,
Am 10.09.2012 um 21:35 schrieb Chuck Hill <email@hidden>:
>>> "WorkerThread207" that many worker threads indicates two things to me:
>>> 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users
>>
>> You may be right, they were at 16/512/8/128. I just set them to 4/8/8/6 and am eager to watch the behaviour tomorrow.
>
> You should at least know when there is a problem sooner. Then as quickly as you can, get a thread dump with jstack.
>
>>
>> There are up to 100 users concurrently (it's a backoffice app), although concurrently running requests are typically not more than 2-3, plus 1-2 DirectActions, plus possibly 1-2 long response pages running statistics stuff.
>
> OK, the 4/8/8/6 numbers you have seem reasonable for that load.
>
>
>>> 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code.
>>
>> When that situation occurs, the app is not using CPU any more, neither is the database. It often doesn't respond to SIGTERM any more and needs SIGKILL to terminate so we can restart.
>
> That sounds like what a blocked non-daemon thread would cause.
>
>
>>> The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future.
>>
>> Thanks for the offer; here is the full jstack output:
>> http://akaihi.selbstdenker.com/~maik/jstack_powerd_20120910.txt
>
> Other than having a large number of idle worker threads, there is nothing in that trace that indicates the problem. In my experience, that means that they problem has resolved itself and the application recovered. You will need to run jstack closer to the start of the problem even to capture what is going wrong.
The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into "no instance" responses after the timeout elapsed.
If the problem persists, I think I'll set up a cronjob to record jstacks every couple of minutes or so.
Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process.
Maik
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden