Re: WoMonitor "Failed to contact ..."
Re: WoMonitor "Failed to contact ..."
- Subject: Re: WoMonitor "Failed to contact ..."
- From: Helmut Tschemernjak <email@hidden>
- Date: Sat, 03 Aug 2013 18:34:49 +0200
- Organization: HELIOS Software GmbH
Hi Philippe,
I have no idea about your setup, your listing shows IPv6 which means it
is likely an Java to Java communication because http is usually IPv4.
CLOSE_WAIT means one end closed a socket connection and the other end is
not closing the connection because it still active but busy with other
work therefore the closing thread may hang.
You need to find out which process is stuck. I believe :dc means port
2001 you can also instruct lsof to print it in decimal. Do more tracing
to find out the process pair on the communication.
BTW: There is a setSoLinger() socket option to define what to do in case
the other end is not responding to a close WOApplication sets this to
lifebeatSocket.setSoLinger(false, 0). I have seen in WO 5.4.3 in some
case this options are not setup perfectly e.g.
_listenSocket.setReuseAddress(true) is missing in the WOClassicAdaptor.
http://www.jguru.com/faq/view.jsp?EID=33897
PS: I am checking the entire networking in WO to enhance it that it is
more scalable and more robust.
regards
Helmut
On 02.08.13 16:49, Philippe Rabier wrote:
Hi All,
I resurrect this discussion again ;-)
We had today the same symptom "Failed to contact..." which was persistent. We got this problem in the past but rarely.
After googling "Failed to contact..." I found Kieran email. And we got the same result when executing the following command:
ibabar:~ admin$ sudo lsof -i tcp | grep CLOSE_WAIT
java 34524 _appserver 137u IPv6 0x171e9344 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58973 (CLOSE_WAIT)
java 34524 _appserver 138u IPv6 0x2148f5a8 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59191 (CLOSE_WAIT)
java 34524 _appserver 140u IPv6 0x2141d344 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59070 (CLOSE_WAIT)
java 34524 _appserver 144u IPv6 0x2e28c984 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59114 (CLOSE_WAIT)
java 34524 _appserver 145u IPv6 0x2db8bb2c 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59074 (CLOSE_WAIT)
java 34524 _appserver 146u IPv6 0x13509a70 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58845 (CLOSE_WAIT)
java 34524 _appserver 152u IPv6 0x214440e0 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58853 (CLOSE_WAIT)
java 34524 _appserver 158u IPv6 0x2db23400 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59155 (CLOSE_WAIT)
java 34524 _appserver 176u IPv6 0x2e23b19c 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59034 (CLOSE_WAIT)
java 34524 _appserver 178u IPv6 0x2102f8c8 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59163 (CLOSE_WAIT)
java 34524 _appserver 179u IPv6 0x21523d90 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59110 (CLOSE_WAIT)
java 34524 _appserver 184u IPv6 0x20c995a8 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59199 (CLOSE_WAIT)
java 34524 _appserver 187u IPv6 0x2e1f98c8 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59042 (CLOSE_WAIT)
java 34524 _appserver 190u IPv6 0x2df27664 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59046 (CLOSE_WAIT)
java 34524 _appserver 191u IPv6 0x2dd3b4bc 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59086 (CLOSE_WAIT)
java 34524 _appserver 193u IPv6 0x2e01cf38 0t0 TCP ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59050 (CLOSE_WAIT)
After doing a dump, we saw the threads were locked as follow:
java.lang.Thread.State: BLOCKED (on object monitor)
at er.extensions.eof.ERXEnterpriseObjectCache.cache(ERXEnterpriseObjectCache.java:380)
My question is about the cause of the CLOSE_WAITs and JavaMonitor: why the monitor is not able to contact the wotaskd because one instance is locked and I presume because the wotask is not able to contact the instance above?
I resurrect this mail because it's a good tip to use if someone get the message "Failed to contact..." in the monitor.
Cheers,
Philippe
On 30 avr. 2009, at 23:30, Kieran Kelleher wrote:
Resurrecting this old discussion again :-(
OK, a while ago, one xserve "omega" (running Leopard Server 10.5.6, WO 5.4.X wotaskd with fully embedded WO 5.3.3 apps) showed up in WOMonitor as Failed to Contact again. Remember WOMonitor is running on Tiger Server 10.4.8 with the wotaskd from WO 5.3.3.
Rather than assume this is a wotaskd/networking problem this time, I decided to check the WO apps on that server "192.168.3.154" using lsof and jstack to see if I can find anything unusual and I did:
OK, 192.168.3.154 has 2 apps running on it. pid-479 port 2001) and pid-43 (port 2004). Also wotaskd is running as pid 43
app pid-479 lsof -i tcp:2001 shows nothing unusual
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 479 _appserver 7u IPv6 0x830bb2c 0t0 TCP [::192.168.3.154]:dc (LISTEN)
app pid-947 has unusual output, lsof -i tcp:2004 reveals 256 CLOSE_WAITs!!! .... this app is not allowing logins
http://67.78.26.66:81/~kieran/misc/lsof_tcp_2004_pid_43.txt
BTW, the other IP 192.168.3.149 shown on the CLOSE_WAIT lines is the machine that is running WOMonitor/apache, so this would seem to indicate a lot of hung requests? (that's a question, Chuck ;-) )
lsof for wotaskd itself gives this, which doesn't seem unusual
bash-3.2# lsof -i tcp:1085
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 43 _appserver 8u IPv6 0x6e1d258 0t0 TCP [::192.168.3.154]:webobjects (LISTEN)
java 43 _appserver 11u IPv6 0x830b664 0t0 TCP [::192.168.3.154]:webobjects->[::192.168.3.154]:49665 (ESTABLISHED)
java 43 _appserver 12u IPv6 0x8e41cd4 0t0 TCP [::192.168.3.154]:webobjects->[::192.168.3.154]:53449 (ESTABLISHED)
java 479 _appserver 10u IPv6 0x830b8c8 0t0 TCP [::192.168.3.154]:49665->[::192.168.3.154]:webobjects (ESTABLISHED)
java 947 _appserver 10u IPv6 0x8e7d344 0t0 TCP [::192.168.3.154]:53449->[::192.168.3.154]:webobjects (ESTABLISHED)
Now looking at the jstack outputs, we also have more useful clues.
jstack on the pid-947 (port 2004) app reveals it has session store deadlocks!! This is the same app with all the CLOSE_WAITs
http://67.78.26.66:81/~kieran/misc/jstack_pid_947.txt
So, it would seem that the stupid 'Failed to contact" stuff I have been seeing are really caused by Session Store deadlocks. So, the first thing I am going to do now is turn OFF concurrent request handling and turn on Wonder Session Store Deadlock detection for this app ...... however, I would wager that I will not see any Sesion Store deadlocks with concurrent request handling turned off!
Any ideas on a strategy for deadlock detection with concurrent request handling ON?
On Mar 25, 2009, at 10:34 PM, Chuck Hill wrote:
On Mar 25, 2009, at 7:21 PM, Kieran Kelleher wrote:
Hi again Chuck,
If you are going to use the the domain name (for example www.website.com, which resolves to 67.88.91.233 for example) doesn't that mean you have to open port 1085 on the router between public internet and that apache/WoMonitor machine?
Apache is behind the firewall. Only ports 80 and 443 go though.
Chuck
-Kieran
On Mar 23, 2009, at 12:25 PM, Chuck Hill wrote:
On Mar 21, 2009, at 6:35 PM, Kieran Kelleher wrote:
Hi Chuck,
Still getting this problem after a few days of running .... last time we discussed, I had updated all the WO servers which run leopard to use IP address for host name...... I still have not touched the single only Tiger machine that is apache and runs the site's WOMonitor and has a couple tiny insignificant WO apps. I am not ready to upgrade this machine to a Leopard machine just yet, so I guess that is the next guy to be updated with IP addresses instead of its Bonjour name ..... but I have a question for you based on your experience with this:
- For that primary WOMonitor machine which is the main site webserver, should I change to localhost, 127.0.0.1 or the actual IP address of the machine in WOMonitor Host settings and wotaskd properties? (FWIW, for last couple of years, we have used the Bonjour host.local name style on that machine)
We usually use neither. We use the name that DNS lookups (reverse lookup working is important too) to the primary IP on that machine.
--
Chuck Hill Senior Consultant / VP Development
Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems.
http://www.global-village.net/products/practical_webobjects
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden