site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com User-agent: Mutt/1.4.2.1i On Wed, Jan 26, 2005 at 01:58:52PM -0800, Chris Sarcone wrote:
hwmond is not part of Darwin and is not a documented feature of the Server version of OS X to rely upon (i.e. it is subject to change at Apple's whim). Sounds like maybe you'd want to start from the other end of the equation (what the ganglia cluster monitor requires as input) and write a tool(s) that provides this info back to the monitor and folks on the list can help you out getting that information.
Dear Chris, Thanks for for the information - Im trying this list as it wasnt clear to me where to ask questions about daemons such as hwmond - and Id seen previous questions about hwmond on this list - also as you will see from this post there are definite kernel issues I have already written my own version of hwmond for delivering data to ganglia - ganglia does not require anything as input one just sends it values to be recorded - at the moment this is most of the stuff hwmond logs for server monitor - fan speeds/voltages/currents/temperatures temperature was in fact the main driver for this as I wanted something that would shut down the machine if it rose above a certain temperature - again something associated withhwmond but with undocumented and more importantly unchangeable limits - this is something my hwmond does with user specifiable limits - in addition to providing a socket to deliver the data to any other process that wishes to request it this was derived from Apples open source for the ioreg command however there seems to be a possible issue that Im hitting a Darwin kernel bug with my program - every so often the odd Xserve G5 node will panic - from the kernel core dump it seems one of the processes active at the time of the panic is my version of hwmond (im still running Apples hwmond) - this is fairly infrequent e.g. over 14 days for 96 nodes something like 5-6 have paniced at random intervals - my version of hwmond is looping every 60 seconds using IOServiceGetMatchingServices/IORegistryEntryCreateCFProperty to extract data from IOHWSensor and AppleFCU class objects - supposedly only reading kernel values one possible explanation is that there is some interference between hwmond and myhwmond - both of them obviously trying to hit the same hardware registers (although its still a rare event) - hence if I can get the information from hwmond may be I dont need to do the IOService scanning in my program and maybe the issue will go away others are of course there is some kernel bug/race condition which Im hitting - or some issue in the IOKit library - the traceback always seems to be in routines labelled as object serialisation or it could be some interrupt issue I dont understand enough of Darwin kernel dumps to determine if the issue is simply from a program running at the time or its actually doing interrupt processing at the time of the panic of course its not clear to me what the current stability of Darwin is on XServe G5 nodes - there are significant new features in these machines which might lead to kernel issues - only machines with ECC etc. - issues with the 64 bit nature of the processors and the 32 bit version of Darwin - could even be of course processor errata In addition Id like to be able to check the system warning light and turn it on/off from the command line - also get the data on ECC errors (which seem to be happening on a node) and one node has a low voltage condition - i havent yet reverse engineered the limits in Apples hwmond into my version of hwmond to detect this I was also hoping to still use the e-mail notification features of hwmond Thanks David _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... This email sent to site_archiver@lists.apple.com
participants (1)
-
David Osguthorpe