Re: Kernel panics on OSX 10.3.9 on multiple machines in Win2K network (third attempt)
Re: Kernel panics on OSX 10.3.9 on multiple machines in Win2K network (third attempt)
- Subject: Re: Kernel panics on OSX 10.3.9 on multiple machines in Win2K network (third attempt)
- From: Ochal Christophe <email@hidden>
- Date: Fri, 31 Mar 2006 09:45:34 +0200
Terry Lambert wrote:
The weird thing is that all 4 machines started experiancing the same
kernel panics on the same day, the only known change i know of, is that
the SAN's connected to the Win2K server have been moved to a different
location (fysical move), and their network manager insists nothing else
was changed, nor were there any updates applied to the server. Since
this also happened the 8th of March (the day the kernel panics started
occurring on all four macs) leads me to think that it's really a
software issue/glitch somewhere, i'm in the dark however, when it comes
to solving these.
Then whatever's in the middle that wasn't, or isn't in the middle
that was, is likely the culprit. It's also possible that there was
data corruption on the server itself as a result of physical shock
during the mode, and that's now being passed on to the client
machines, which are barfing on it. You didn't give the number of
machines having the smbfs KEXT in the panic traceback, but if it's
more than onem, then this is likely the case. In general clients try
to be immune to the server sending them bad things, but you can't
always catch everything.
All four macs have the issue. I've updated one machine to OSX 10.4.5 as
a trial, and when i returned it yesterday, it too crashed, this time 2
machines died on shutdown. I also managed to convince their IT
department to atleast have the cables tested & the local 8port switch
replaced.
Unfortunatly; there wasn't a panic.log generated on that crash yesterday
This would be alot easier if i had access to everything on their
network... *sigh*
The other machine looks to be running a third party KEXT, which is
leaking memory in one of the zones (my guess would be AntiVirus
software, since in the past they tried to replace system call entry
points with their own code drived from OpenDarwin; if we made a change
in one of those routines in a point release, they inevitably lead to
memory leaks/panics because of the stale code not *exactly* matching
the update version).
I'd have to check the machine, but it was delivered without any AV
software to the costumer, i'll see what i can find out
The only way would be two machine debugging, and asking it what's
loaded where, so you could resolve a couple of those addresses. Even
then, you'd have to get a controlled reproduction of the problem
(don't know how hard that would be).
Well, hard :) Sofar i've not been able to find a way to reproduce the
crashes, things it crashes on the first time work the second & third time.
>It also looks like someone has tuned some of the administrative limits
on the number of open fd's up on one of the machines, far past the
amount of memory available for such things (a couple of the panics are
NULL pointer dererferences following an allocation failure in the
M_FILE zoen, which would not occur unless the administrative limits on
the machine had been explicitly changed).
Where would one change such a setting?
Typically, in the server administration settings on the machine. The
normal place this is set for SMB in a Samba server, for example, is
in its configuration file, or via a GUI configuration tool. The
specific limit you are looking for is the hard limit in the setrlimit
in the value of RLIMIT_NOFILE, which could be raised as high as
10240, but that number is generally excessive unless you have a large
amount of memory in the machine to enable to handle that many open
files, and the level of client load (number of smbd processes) that
that would entail. The typical failure mode is to try to set this
to "unlimited", which is usually one of the allowable options.
If i understand you correctly, this would be on the SMB service on the
macs, they don't run any smb services, they only access smb shares on a
win2k server.
I've not been able to find anything abnormal in their config, nor any
reference to maximum number of open files.
There are a number of resources available on the web and on
developer.apple.com that deal with MacOS X server tuning (e.g. put
the last four words there into google and look around a bit).
There is no OSX server present.
The network configuration is as follows (for asfar as i know):
-----------------------
| Network backbone |
-----------------------
| |
| |
-----------------
| --------------------------| Win2k server |
|
-----------------
|
| | |
|
| | | -------
|
| | ----> | San 1 |
|
| | --------
|
| | -------
|
| ------>| San 2 |
|
| -------
|
| --------
|
----->| San 3 |
|
--------
-----------------
| 3Com Switch |
----------------
| | | | |
| | | | -> Printer
| | | > Mac
| | > Mac
| > Mac
> Mac
I only have access to the Mac's & printer, the rest is managed by the
company's internal IT department.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden