Ideas? Weird WOWorkerThread hang on Solaris
Ideas? Weird WOWorkerThread hang on Solaris
- Subject: Ideas? Weird WOWorkerThread hang on Solaris
- From: Chuck Hill <email@hidden>
- Date: Thu, 05 May 2011 15:12:35 -0700
Hi,
The environment are multiple SunFire boxes running Solaris 10 something with a WO 5.2.4 (nope, not a typo) app using JDK 1.4.2. This is with the newest Wonder mod_webobjects re-compiled for Apache 2.2 on these machines. wotaskd and JavaMonitor are the latest from Wonder, but the stock 5.2.4 versions of all three exhibited the same problem. There is a web balancer in front, with Apache running on three machines, and instances running on 5 machines.
Most of the time the instances are very responsive with dispatchRequest processing times averaging 41ms. Then under increasing load (users per instance, we can reproduce this with one instance and 20 people), responses start slowing down, quickly spreading to all users.
The WOAdaptorInfo page show multiple active requests in each instance while dispatchRequest shows an idle instance. If we get a thread dump from the instance, we see that all of the WOWorkerThread's are blocked in socketRead (the pool quickly grows from 16 to the max configured):
"WorkerThread60" prio=5 tid=0x00100cb0 nid=0x47 runnable [0xe08ff000..0xe08ffc30]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at com.webobjects.appserver._private.WOHttpIO.refillInputBuffer(WOHttpIO.java:131)
at com.webobjects.appserver._private.WOHttpIO.readLine(WOHttpIO.java:187)
at com.webobjects.appserver._private.WOHttpIO.readRequestFromSocket(WOHttpIO.java:279)
at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:79)
at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:254)
at java.lang.Thread.run(Unknown Source)
Then after several seconds to nearly a minute, they all unblock and complete normally. Sometimes it takes longer than a minute (Receive Timeout is set to a high 60 seconds) and the users get bounced to a new instance where they get an "unable to restore session" message. Oddly, all the servers seems to get blocked and unblocked at pretty much the same time.
I think the problem exists in the other direction as well, as we have seen cases where the page partly loaded and finished after two or more minutes.
There is little CPU, memory, or I/O load on the machines. The network tests out OK. Apache vends static resources quickly when the apps are stuck.
Any ideas? It looks to be a problem in the TCP communications between the app servers and the instance servers. But what? Anyone want to play?
Chuck
--
Chuck Hill Senior Consultant / VP Development
Come to WOWODC this July for unparalleled WO learning opportunities and real peer to peer problem solving! Network, socialize, and enjoy a great cosmopolitan city. See you there! http://www.wocommunity.org/wowodc11/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden