Re: How to debug VM trouble?
Re: How to debug VM trouble?
- Subject: Re: How to debug VM trouble?
- From: Terry Lambert <email@hidden>
- Date: Thu, 24 Jan 2008 17:10:24 -0800
On Jan 24, 2008, at 3:47 PM, Michael Smith wrote:
On Jan 24, 2008, at 1:31 PM, Michael Crawford wrote:
We were suspecting for a while that the bug was in the OS X kernel.
We're going to do whatever it takes to find the real source of
trouble. If it turns out that we later conclude that it is in fact
in
Apple's code, I'll be sure to file a bug report and post the number
to
this thread.
Michael,
If you have a reduced version of your tool that you can attach to
the bug, I'm sure that will help. As a general rule, kernel memory
starvation should not result in system calls that could be deferred
instead returning errors, although per Rick's thread a while back on
this list it is a nontrivial thing to get the right behaviour in the
face of the heavy layering within the kernel.
What I'm saying is that the miscompares you're seeing may well be a
tool bug, but the filesystem operations returning ENOMEM feels to me
like a kernel bug, and you should file accordingly.
He needs to tell us what operations.
There are ~470 place, not including manual pages, where ENOMEM occurs
in the kernel. There are ~300 places not involving networking, and
~60 in VFS alone. These are event sites, which means that if they are
in support functions, they could each be multiplied by the functions
that call them and propagate errors up, for the call graph depth.
This works out to about 250 (after call amplification) places he could
legitimately end up with ENOMEM because he is, in fact, out of memory
(attempting to allocate from an exhaustible non-expandable zone, etc.).
-
Michael, you need to tell us where you're getting ENOMEM from. The
easiest way would be to pick the system call in user space that's
giving you the ENOEM, then use DTRACE to watch it with a speculative
probe, and when you get the error back at the system call return
point, print out the call graph that got you the error. This will
tell you where and why the ENOMEM failure is happening, which should
lead you to the root cause.
Actually, knowing where the error is from will probably be enough to
let you actually fix the problem, without involving anyone else, I
suspect.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden