Re: WoMonitor "Failed to contact ..."
Re: WoMonitor "Failed to contact ..."
- Subject: Re: WoMonitor "Failed to contact ..."
- From: Kieran Kelleher <email@hidden>
- Date: Fri, 1 May 2009 15:11:08 -0400
Here is what might help Miguel to figure it out.
1) Override ERXSession#threadName() to return a string to replace the
%t in the logging pattern. I did to return sessionID, username and
real threadname. You can additionally add context().componentName and
context().pageName() too if you don't mind the clutter in the logs.
2) Turn off concurrent req handling and turn on ERX Session Store
Deadlock detection.
3) Ensure you are using log4j smtp appender to get notified of
log.errors with host, port in the subject for easy id of the exact
instance:
log4j.appender.myMail.Subject=Log4J Error (@@build.app.bundle.name@@
host @@host@@ port @@WOPort@@)
4) Ensure you log each instance to a separate file that can be easily
identified:
log4j.appender.A2.file=/var/log/webobjects_apps/
@@build.app.bundle.name@@-@@WOPort@@.log
5) Wait for the deadlock and the automatic log.error email from your
app.
6) Find the same log.error in the log file and the inevitable WARN
logs before and afterward to identify the user.
The above was enough for me to get to talk to the user that it happned
to this morning, one of our customer service reps and to find out
exactly what happened. She actually IM'd me before the log.error came
to me to ask why she was getting the error! Most users assume that
support get emails if they get an error .... that is what the error
page says! It says we know about it and will fix it shortly ....... so
they never report it. Unfortunately due to the thread being respawned
the log.error email was never happening! So sessions were getting
deadlocked to the point that all threads were locked and then the
'Failed to Contact" would start on that instance. Interesting ... and
seems obvious now that we look back at it.
Now before I got the call this morning, I had added two extra
information capture bits that I was about to deploy and didn't. These
might help Miguel.
a) Point er.extensions.ERXApplication.StatisticsBaseLogPath to a
logging directory and customize output if you like - note it will not
log until a session terminates, so you have to grab the page history
for the user inside ERXAppliclication by overriding
WOSession._formattedStatistics() (since it is private) and deliver the
info using a public method to ERApplication.SessionInfo.exception...
(...)
b) try/catch ERXApplication.extraInformation...( context ) on both old
and current contexts in ERXApplication.sessionInfo() and logging that
out with ERX....dumpObject( ...)
c) Put some info in ERXThreadStorage for every request that you can
grab in ERXApplication.SessionInfo to log it out too when a deadlock
happoens.
BTW, this is everything from the logs at the deployment site related
to the single occurrence of ERX deadlock detection this morning:
WARN 2009-05-01 10:06:04,611 [fsv2Xy1tU6IuQs948ftDmw-sherie-
WorkerThread6] (er.extensions.eof.ERXEC: 246) - Unlocking context that
wasn't unlocked in RR-Loop!:
wk.webobjects.appserver.WKEditingContext@865c1d
WARN 2009-05-01 10:06:04,612 [fsv2Xy1tU6IuQs948ftDmw-sherie-
WorkerThread6] (er.extensions.eof.ERXEC: 246) - Unlocking context that
wasn't unlocked in RR-Loop!:
wk.webobjects.appserver.WKEditingContext@96051d
WARN 2009-05-01 10:06:04,612 [fsv2Xy1tU6IuQs948ftDmw-sherie-
WorkerThread6] (er.extensions.eof.ERXEC: 246) - Unlocking context that
wasn't unlocked in RR-Loop!:
wk.webobjects.appserver.WKEditingContext@65e4fc
WARN 2009-05-01 10:06:04,613 [fsv2Xy1tU6IuQs948ftDmw-sherie-
WorkerThread6] (er.extensions.eof.ERXEC: 246) - Unlocking context that
wasn't unlocked in RR-Loop!:
wk.webobjects.appserver.WKEditingContext@e793e4
WARN 2009-05-01 10:06:04,613 [fsv2Xy1tU6IuQs948ftDmw-sherie-
WorkerThread6] (er.extensions.eof.ERXEC: 246) - Unlocking context that
wasn't unlocked in RR-Loop!:
wk.webobjects.appserver.WKEditingContext@a97693
WARN 2009-05-01 10:06:04,614 [fsv2Xy1tU6IuQs948ftDmw-sherie-
WorkerThread6] (NSLog: 43) - <WOWorkerThread id=6 socket=null>
Throwable occurred: java.lang.NoSuchMethodError:
org.apache.poi.hssf.usermodel.HSSFRow.createCell(I)Lorg/apache/poi/
hssf/usermodel/HSSFCell;
ERROR 2009-05-01 10:06:04,619 [WorkerThread7]
(er.extensions.appserver.ERXApplication: 2003) - There is an error in
the session check-out: old context: contextId: 17 request:
<er.extensions.appserver.ERXRequest
(<er.extensions.appserver.ERXRequest httpVersion=HTTP/1.1
headers={remote_host = (67.78.26.66); user-agent = (Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1
(KHTML, like Gecko) Version/3.2.1 Safari/525.27.1); document_root = (/
Library/WebServer/Documents); server_name = (www.smartleadsusa.net);
accept = (text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5); remote_addr =
(67.78.26.66); server_admin = (email@hidden);
original_context_id = ("16"); remote_port = (33825); connection =
(close); server_software = (Apache/1.3.33); cookie =
(__utma
=240748929.2937839802557752300.1240937027.1240937027.1240937027.1;
__utmz=240748929.1240937027.1.1.utmcsr=(direct)|utmccn=(direct)|
utmcmd=(none); AWSUSER_ID=awsuser_id1236172384270r9091); host = (www.smartleadsusa.net
); referer = (http://www.smartleadsusa.net/app/27/wo/fsv2Xy1tU6IuQs948ftDmw/15.0.21.13.1
); x-webobjects-adaptor-version = ("Apache"); x-webobjects-request-id
= (49f893080000013e00000800); script_uri = (http://www.smartleadsusa.net/app/27/wo/fsv2Xy1tU6IuQs948ftDmw/16.0.21.13.1.1.0.5.1.1.3.0.3
); script_filename = (passthrough:/cgi-bin/WebObjects/cheetah.woa/27/
wo/fsv2Xy1tU6IuQs948ftDmw/16.0.21.13.1.1.0.5.1.1.3.0.3); x-webobjects-
request-method = (GET); accept-encoding = (gzip, deflate); server_port
= (80); script_url = (/app/27/wo/fsv2Xy1tU6IuQs948ftDmw/
16.0.21.13.1.1.0.5.1.1.3.0.3); accept-language = (en-us); } content-
length=0 cookies=null userInfo=null>) method=GET uri=/cgi-bin/
WebObjects/cheetah.woa/27/wo/fsv2Xy1tU6IuQs948ftDmw/
16.0.21.13.1.1.0.5.1.1.3.0.3 defaultFormValueEncoding=UTF-8
formValueEncodingDetectionEnabled=NO formValueEncoding=UTF-8
formValues={} >
java.lang.Exception
at er.extensions.appserver.ERXApplication
$SessionInfo.<init>(ERXApplication.java:1979)
at
er
.extensions
.appserver.ERXApplication.restoreSessionWithID(ERXApplication.java:2056)
at
com
.webobjects
.appserver
._private
.WOComponentRequestHandler
._dispatchWithPreparedApplication(WOComponentRequestHandler.java:314)
at
com
.webobjects
.appserver
._private
.WOComponentRequestHandler
._handleRequest(WOComponentRequestHandler.java:358)
at
com
.webobjects
.appserver
._private
.WOComponentRequestHandler
.handleRequest(WOComponentRequestHandler.java:432)
at
com
.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:
1306)
at
er
.extensions
.appserver
.ERXApplication.dispatchRequestImmediately(ERXApplication.java:1769)
at
er
.extensions
.appserver.ERXApplication.dispatchRequest(ERXApplication.java:1734)
at
wk
.webobjects.appserver.WKApplication.dispatchRequest(WKApplication.java:
95)
at
com
.webobjects
.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:173)
at
com
.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:
254)
at java.lang.Thread.run(Thread.java:613)
WARN 2009-05-01 10:06:04,620 [fsv2Xy1tU6IuQs948ftDmw-sherie-
WorkerThread6] (NSLog: 43) - <WOWorkerThread id=6 socket=null>
Workerthread exiting due to error, respawning with ID 10006...
WARN 2009-05-01 10:06:04,745 [WorkerThread7] (NSLog: 43) -
<com.webobjects.appserver._private.WOComponentRequestHandler>:
Exception occurred while handling request:
java.lang.IllegalStateException: Context with id 'null' did check out
again
On May 1, 2009, at 1:25 PM, Chuck Hill wrote:
I don't see why that was causing the deadlock. Can you post the
whole stack trace? Let's see if we can fix this.
Miguel: this might be your problem too.
Thanks
Chuck
On May 1, 2009, at 10:20 AM, Kieran Kelleher wrote:
FYI, Found the sessionstore deadlock culprit!
The root cause was an error in appendToResponse in EGWrapper which
was caused by my having TWO different version poi jars since Timo
replaced the jar in January. And as we know, when you install
Wonder, it writes over existing frameworks, so jars with different
names are now merged, included in the embedded build and MacOS
ClassPath and cause havoc. In any case, my fix for duplicate POI
jars was to add ExcelGenerator to the list of frameworks to
"clean" (aka "rm -r" in my WOnder install script).
Stack trace for the error that caused sessionstore deadlock.
java.lang.NoSuchMethodError:
org.apache.poi.hssf.usermodel.HSSFRow.createCell(I)Lorg/apache/poi/
hssf/usermodel/HSSFCell;
at
er.excel.EGSimpleTableParser.parseTable(EGSimpleTableParser.java:358)
at er.excel.EGSimpleTableParser.parseNode(EGSimpleTableParser.java:
261)
at er.excel.EGSimpleTableParser.parse(EGSimpleTableParser.java:246)
at er.excel.EGSimpleTableParser.workbook(EGSimpleTableParser.java:
122)
at er.excel.EGSimpleTableParser.data(EGSimpleTableParser.java:110)
at er.excel.EGWrapper.appendToResponse(EGWrapper.java:95)
at
com
.webobjects
.appserver
._private
.WOComponentReference.appendToResponse(WOComponentReference.java:111)
at
com
.webobjects
.appserver
._private
.WODynamicGroup.appendChildrenToResponse(WODynamicGroup.java:121)
at
com
.webobjects
.appserver
._private.WODynamicGroup.appendToResponse(WODynamicGroup.java:130)
at
com
.webobjects.appserver.WOComponent.appendToResponse(WOComponent.java:
992)
.....
This link was not in my Selenium test ... it is now :-)
PS. On a different topic, the new POI version produces Excel sheet
full of garbage for me on this download, so until I get a chance to
figure that out I have replaced the Excel link with my commonly
used reobust CSV download feature instead.
On May 1, 2009, at 12:52 AM, Anjo Krank wrote:
Am 01.05.2009 um 04:12 schrieb Kieran Kelleher:
More good news ...... a side-effect of using ERX session store
deadlock detection and/or concurrent req handling=false is that
there are no CLOSE_WAITS and there are *no* threads blocked on
com
.webobjects.appserver.WOSessionStore.checkOutSessionWithID(...),
not a single one in both of the instances that report these errors.
Yeah, it's a bit mis-named, since you get an exception right
before the double-checkout happens. So it's not checked-out and
thus no deadlock. In that regard, it's more "prevention".
Wonder normally gives me a detailed context information such as
previous page history and the exact component (actuallt nested
component tree) that was invoked to cause the error, but it seems
SessionInfo does not call that utility method
(ERXApplication#extraInformationForExceptionInContext(exception,
context)) .... I am going to try adding that to get the full
picture and redeploying.
Have fun... however, as you didn't actually set the session and it
hasn't restored the page, this util probably won't help...
It's been a while since I needed this, but:
- I normally implement ERXComponentActionRedirector.Restorable on
my pages so I get an easy URL for logging
- I did log both the context url and the restorable url
- if you know the page and the restore state, you can call it up
and look at the source and most likely, you will find the context id
- I still did a lot of debugging
Cheers, Anjo
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
--
Chuck Hill Senior Consultant / VP Development
Come to WOWODC'09 in San Fran this June!
http://www.wocommunity.org/wowodc09/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden