Hi Klaus,
Thanks for the reply. Late Friday, we tracked it down (we think!) to a bad build of Apache 2.0 or (more likely) of mod_webobjects. We rolled back to Apache 1.3 and the stock Solaris adaptor from Apple (uh, yeah, WO 5.2). The problem noted below went away.
My working theory at this point is that this branch of the adaptor source is defective in some way under load. The plan is to build either the Apache 2.0 or 2.2 (the 2.2 version will require a full build of Apace as well) from Wonder and give that a trial. If that resolves the defect, I will update the Wiki page and give mDimension a copy for their mod_webobjects downloads (kind thanks to Bill Chin & co for hosting this).
Which version of mod_webobjects are you using with Solaris?
Chuck
P.S. A max of 4 sessions per instance is NOT normal for WO app. I would go with closer to 400 :-) The more useful statistic is requests per minute per instance and that varies a lot on what you app is doing.
Chuck
On May 7, 2011, at 12:48 PM, Klaus Berkling wrote:
Hi Chuck.
On May 5, 2011, at 3:12 PM, Chuck Hill wrote: The environment are multiple SunFire boxes running Solaris 10 something with a WO 5.2.4 (nope, not a typo) app using JDK 1.4.2. This is with the newest Wonder mod_webobjects re-compiled for Apache 2.2 on these machines. wotaskd and JavaMonitor are the latest from Wonder, but the stock 5.2.4 versions of all three exhibited the same problem. There is a web balancer in front, with Apache running on three machines, and instances running on 5 machines.
Most of the time the instances are very responsive with dispatchRequest processing times averaging 41ms. Then under increasing load (users per instance, we can reproduce this with one instance and 20 people), responses start slowing down, quickly spreading to all users.
The WOAdaptorInfo page show multiple active requests in each instance while dispatchRequest shows an idle instance. If we get a thread dump from the instance, we see that all of the WOWorkerThread's are blocked in socketRead (the pool quickly grows from 16 to the max configured):
"WorkerThread60" prio=5 tid=0x00100cb0 nid=0x47 runnable [0xe08ff000..0xe08ffc30] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at com.webobjects.appserver._private.WOHttpIO.refillInputBuffer(WOHttpIO.java:131) at com.webobjects.appserver._private.WOHttpIO.readLine(WOHttpIO.java:187) at com.webobjects.appserver._private.WOHttpIO.readRequestFromSocket(WOHttpIO.java:279) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:79) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:254) at java.lang.Thread.run(Unknown Source)
Then after several seconds to nearly a minute, they all unblock and complete normally. Sometimes it takes longer than a minute (Receive Timeout is set to a high 60 seconds) and the users get bounced to a new instance where they get an "unable to restore session" message. Oddly, all the servers seems to get blocked and unblocked at pretty much the same time.
I think the problem exists in the other direction as well, as we have seen cases where the page partly loaded and finished after two or more minutes.
There is little CPU, memory, or I/O load on the machines. The network tests out OK. Apache vends static resources quickly when the apps are stuck.
Any ideas? It looks to be a problem in the TCP communications between the app servers and the instance servers. But what? Anyone want to play?
Not sure if I can point you to things you haven't already looked at.
Our WO app is different from most others, keeping that in mind here are some thoughts: - Check the sysctl values (maxfiles, maxproc, etc). - 20 users (sessions?) on one instances seems a bit much, our instances become unhappy when they reach around 4 sessions. Our app is database intensive so it makes a difference for us. What happens if you add 3 more instances each existing instance? Less relevant: - Keep the number of running httpd processes low. Never liked keeping a high number of idle httpd servers running. - Any slow database queries?
Hope this helps.
kib
"We keep moving forward, opening new doors, and doing new things, because we're curious and curiosity keeps leading us down new paths." Walt Disney
Klaus Berkling Web Application Dev. & Systems Administrator DynEd International, Inc.
-- Chuck Hill Senior Consultant / VP Development
Come to WOWODC this July for unparalleled WO learning opportunities and real peer to peer problem solving! Network, socialize, and enjoy a great cosmopolitan city. See you there! http://www.wocommunity.org/wowodc11/
|