Re: Deaths = broken pipe?
Re: Deaths = broken pipe?
- Subject: Re: Deaths = broken pipe?
- From: Chuck Hill <email@hidden>
- Date: Mon, 26 Jan 2009 13:21:58 -0800
On Jan 23, 2009, at 11:11 AM, Kieran Kelleher wrote:
On one slow G4 xserve, I have had situation about once month
whereby I get an email (presumably from wotaskd) from the specific
server reporting one or two "Deaths" on an instance running on that
machine. I look at the time of the email and compare that to the
logs, I don't see the usual startup stuff in the log, so it makes me
wonder if the app really restarted at all.
Possibly not. Is it setup to dispatch requests concurrently? If
wotaskd failed to talk to the app before timing out and this happened
two (or three?) times in a row, it will mark it as "dead" even though
the process is still running.
Secondly, the last entry before the time of a "Death" is usually
something like:
WARN 2009-01-23 13:25:49,844 [WorkerThread5] (NSLog: 43) -
<WOWorkerThread id=5 socket=Socket[addr=/
192.168.1.149,port=58070,localport=2001]> Exception while sending
response: java.net.SocketException: Broken pipe
That probably means that it took so long to return the result that
Apache has given up on the request or the user hit Stop or navigated
elsewhere. In other situations, this usually means the user navigated
elsewhere before getting a response. So, usually, this is something
you can safely ignore. Usually. But it might also indicate you app
is processing some requests too slowly.
If I go look at the WOStats page for the instance it tells me that
this instance is up for nearly 7 days, so that would seem to
indicate that the app really did not crash and restart ..... also no
entry in /Library/Logs/CrashReporter for that time of the "Death".
Death means that wotaskd could not communicate with the instance, but
not necessarily that the process stopped.
So my question is if this is a case where the response taking too
long is being considered a "Death", but the app is not really dying
and being restarted at all?
Yes, exactly.
I am guessing that a user is accessing a specific page that is
loading a lot of data and they get a no instance available, but this
is not killing the app (BTW, concurrent request handling is on) ...
is this the case?
Yes, that is my interpretation of what you are describing.
Secondly, I try to determine which page may have broken the pipe by
looking at WOStats, but I find most actions have a max of ~2 seconds
and perhaps 3 pages have a max of ~11 seconds although their average
is less than 1 second, so there is no action near ~30 seconds ...
how would I determine the page that caused the broken pipe? .... or
should I just focus on ensuring that those pages that did have a
high max of ~11 secs get reqorked so their max never exceeds ~2 secs?
I am not sure that those stats are 100% trustworthy. I log my own
dispatch times into the app log. I also log what is happening in each
thread which makes tracking down these sorts of problems very easy.
Chuck
--
Chuck Hill Senior Consultant / VP Development
Practical WebObjects - for developers who want to increase their
overall knowledge of WebObjects or who are trying to solve specific
problems.
http://www.global-village.net/products/practical_webobjects
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden