Re: Kernel panics on OSX 10.3.9 on multiple machines in Win2K network (third attempt)
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com User-agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) Terry Lambert wrote: The weird thing is that all 4 machines started experiancing the same kernel panics on the same day, the only known change i know of, is that the SAN's connected to the Win2K server have been moved to a different location (fysical move), and their network manager insists nothing else was changed, nor were there any updates applied to the server. Since this also happened the 8th of March (the day the kernel panics started occurring on all four macs) leads me to think that it's really a software issue/glitch somewhere, i'm in the dark however, when it comes to solving these. Unfortunatly; there wasn't a panic.log generated on that crash yesterday The other machine looks to be running a third party KEXT, which is leaking memory in one of the zones (my guess would be AntiVirus software, since in the past they tried to replace system call entry points with their own code drived from OpenDarwin; if we made a change in one of those routines in a point release, they inevitably lead to memory leaks/panics because of the stale code not *exactly* matching the update version). I'd have to check the machine, but it was delivered without any AV software to the costumer, i'll see what i can find out
It also looks like someone has tuned some of the administrative limits on the number of open fd's up on one of the machines, far past the amount of memory available for such things (a couple of the panics are NULL pointer dererferences following an allocation failure in the M_FILE zoen, which would not occur unless the administrative limits on the machine had been explicitly changed).
There is no OSX server present. The network configuration is as follows (for asfar as i know): Then whatever's in the middle that wasn't, or isn't in the middle that was, is likely the culprit. It's also possible that there was data corruption on the server itself as a result of physical shock during the mode, and that's now being passed on to the client machines, which are barfing on it. You didn't give the number of machines having the smbfs KEXT in the panic traceback, but if it's more than onem, then this is likely the case. In general clients try to be immune to the server sending them bad things, but you can't always catch everything. All four macs have the issue. I've updated one machine to OSX 10.4.5 as a trial, and when i returned it yesterday, it too crashed, this time 2 machines died on shutdown. I also managed to convince their IT department to atleast have the cables tested & the local 8port switch replaced. This would be alot easier if i had access to everything on their network... *sigh* The only way would be two machine debugging, and asking it what's loaded where, so you could resolve a couple of those addresses. Even then, you'd have to get a controlled reproduction of the problem (don't know how hard that would be). Well, hard :) Sofar i've not been able to find a way to reproduce the crashes, things it crashes on the first time work the second & third time. Where would one change such a setting? Typically, in the server administration settings on the machine. The normal place this is set for SMB in a Samba server, for example, is in its configuration file, or via a GUI configuration tool. The specific limit you are looking for is the hard limit in the setrlimit in the value of RLIMIT_NOFILE, which could be raised as high as 10240, but that number is generally excessive unless you have a large amount of memory in the machine to enable to handle that many open files, and the level of client load (number of smbd processes) that that would entail. The typical failure mode is to try to set this to "unlimited", which is usually one of the allowable options. If i understand you correctly, this would be on the SMB service on the macs, they don't run any smb services, they only access smb shares on a win2k server. I've not been able to find anything abnormal in their config, nor any reference to maximum number of open files. There are a number of resources available on the web and on developer.apple.com that deal with MacOS X server tuning (e.g. put the last four words there into google and look around a bit). ----------------------- | Network backbone | ----------------------- | | | | ----------------- | --------------------------| Win2k server | | ----------------- | | | | | | | | ------- | | | ----> | San 1 | | | | -------- | | | ------- | | ------>| San 2 | | | ------- | | -------- | ----->| San 3 | | -------- ----------------- | 3Com Switch | ---------------- | | | | | | | | | -> Printer | | | > Mac | | > Mac | > Mac
Mac I only have access to the Mac's & printer, the rest is managed by the company's internal IT department.
_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... This email sent to site_archiver@lists.apple.com
participants (1)
-
Ochal Christophe