Re: Nagios plugin for JavaMonitor
Re: Nagios plugin for JavaMonitor
- Subject: Re: Nagios plugin for JavaMonitor
- From: Johann Werner <email@hidden>
- Date: Wed, 26 May 2010 07:00:58 +0200
Am 26.05.2010 um 01:15 schrieb Pascal Robert:
>
> Le 2010-05-25 à 13:28, Chuck Hill a écrit :
>
>> Hi Pascal,
>>
>> On May 24, 2010, at 5:14 PM, Pascal Robert wrote:
>>
>>> Hi everyone,
>>>
>>> I'm working on a Nagios plugin that will use the /admin/info direct action that was added in the Wonder variant of JavaMonitor. Right now, the plugin is doing the doing the following :
>>>
>>> - for the "state" key, if the state is DEAD, CRASHING or STOPPING, it sends a CRITICAL signal, if it's UNKNOWN or STARTING, it will send a WARNING signal, if the state is ALIVE, it will be OK.
>>
>> I think that STOPPING would indicate a manual shutdown or a scheduled restart. Do you really want to send a notification in that case?
>
> I think a notification should be send, unless you're doing something big at shutdown, you might never get a notification. For example, our Nagios setup is sending notifications if the service is having a problem after 5 minutes (regular interval is 3 minutes, interval when a problem is detected is 1 minute, so 3 + 1 + 1). If your app is in a STOPPING state for more than 4 minutes, it might be a problem. But a warning signal should be better than a critical one.
>
>> UNKNOWN probably represents a timeout talking to wotaskd which would indicate one:
>> - deadlocked / backlogged instance
>> - wotaskd stopped on some machine
>> - network problems reaching some machine
>>
>> Those might warrant a CRITICAL signal.
>
> Good catch.
>
>>
>>> - for the "deaths", "transactions", "activeSessions", "averageIdlePeriod" and "avgTransactionTime" keys, it will check against the warning and critical values, if the actual values are higher than the params, it will send the appropriate signals.
>>
>> The last time I looked, these are not cleared when an application is unscheduled and stopped. If these values were outside of the limits when this happened, that could result in a lot of notifications.
>>
>>
>>> - for refusingNewSessions, scheduled and autoRecover, it will send a WARNING signal if the response is "false"
>>
>> I often (usually) have configured, but not scheduled instances for use when upgrading to a new version, handling server failover, higher loads etc. Is there a way to define only the instances that are expected to be scheduled?
>
> Frankly, I don't know how to handle this. Right now, the plugin works on a specific instance. I guess we can pass an array of instance IDs, and if the specified key to check is reaching a warning or critical level for all instances, it will send a notification.
You could specify a number of instances that must be running so by using the direct action .../admin/running?type=app&name=<appname>&num=<count> your script checks if this evaluates to true. If not you fetch a list of all instances of that app and check them one by one to detect the faulty ones.
jw
>
>> Warning on refusingNewSessions is going to send notifications for scheduled restarts, probably not what you want.
>
> Will add a condition for this, if scheduled is true, refusingNewSessions will be ok if set to false.
>
>>
>>> Any opinions on this? The only thing that I need to work on is the help output, so if you want to try the plugin (it's a PERL script), send me a note. You need any version of Nagios and the Wonder variant of JavaMonitor.
>>
>>
>> Sounds like it could be useful.
>>
>> One other thing is that sending passwords on the URL is insecure and passwords that contain non-URL friendly characters are a problem (they don't seem to get decoded in JavaMonitor, not sure about that, I just changed the password).
>
> Will try that.
>
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden