Re: hwmond
Re: hwmond
- Subject: Re: hwmond
- From: "David Osguthorpe" <email@hidden>
- Date: Wed, 26 Jan 2005 23:36:16 -0700
On Wed, Jan 26, 2005 at 01:58:52PM -0800, Chris Sarcone wrote:
>
> hwmond is not part of Darwin and is not a documented feature of the
> Server version of OS X to rely upon (i.e. it is subject to change at
> Apple's whim). Sounds like maybe you'd want to start from the other end
> of the equation (what the ganglia cluster monitor requires as input)
> and write a tool(s) that provides this info back to the monitor and
> folks on the list can help you out getting that information.
>
Dear Chris,
Thanks for for the information - Im trying this list as it wasnt clear
to me where to ask questions about daemons such as hwmond - and Id seen
previous questions about hwmond on this list - also as you will see
from this post there are definite kernel issues
I have already written my own version of hwmond for delivering data to
ganglia - ganglia does not require anything as input one just sends it
values to be recorded - at the moment this is most of the stuff hwmond logs
for server monitor - fan speeds/voltages/currents/temperatures
temperature was in fact the main driver for this as I wanted something
that would shut down the machine if it rose above a certain temperature
- again something associated withhwmond but with undocumented and more importantly
unchangeable limits - this is something my hwmond does with user
specifiable limits - in addition to providing a socket to deliver the
data to any other process that wishes to request it
this was derived from Apples open source for the ioreg command
however there seems to be a possible issue that Im hitting a Darwin
kernel bug with my program - every so often the odd Xserve G5 node will panic
- from the kernel core dump it seems one of the processes active at the
time of the panic is my version of hwmond (im still running Apples hwmond)
- this is fairly infrequent e.g. over 14 days for 96 nodes something like 5-6
have paniced at random intervals - my version of hwmond is looping every 60
seconds using IOServiceGetMatchingServices/IORegistryEntryCreateCFProperty
to extract data from IOHWSensor and
AppleFCU class objects - supposedly only reading kernel values
one possible explanation is that there is some interference between hwmond
and myhwmond - both of them obviously trying to hit the same hardware
registers (although its still a rare event)
- hence if I can get the information from hwmond may be I dont
need to do the IOService scanning in my program and maybe the issue will
go away
others are of course there is some kernel bug/race condition which Im
hitting - or some issue in the IOKit library - the traceback always
seems to be in routines labelled as object serialisation
or it could be some interrupt issue
I dont understand enough of Darwin kernel dumps to determine if the
issue is simply from a program running at the time or its actually
doing interrupt processing at the time of the panic
of course its not clear to me what the current stability of Darwin is
on XServe G5 nodes - there are significant new features in these machines
which might lead to kernel issues - only machines with ECC etc. - issues
with the 64 bit nature of the processors and the 32 bit version of Darwin
- could even be of course processor errata
In addition Id like to be able to check the system warning light and
turn it on/off from the command line - also get the data on ECC errors
(which seem to be happening on a node) and one node has a low voltage condition
- i havent yet reverse engineered the limits in Apples hwmond into my version
of hwmond to detect this
I was also hoping to still use the e-mail notification features of hwmond
Thanks
David
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
References: | |
| >hwmond (From: David Osguthorpe <email@hidden>) |
| >Re: hwmond (From: Chris Sarcone <email@hidden>) |