site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com Michael, He needs to tell us what operations. - -- Terry _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... On Jan 24, 2008, at 3:47 PM, Michael Smith wrote: On Jan 24, 2008, at 1:31 PM, Michael Crawford wrote: We were suspecting for a while that the bug was in the OS X kernel. We're going to do whatever it takes to find the real source of trouble. If it turns out that we later conclude that it is in fact in Apple's code, I'll be sure to file a bug report and post the number to this thread. If you have a reduced version of your tool that you can attach to the bug, I'm sure that will help. As a general rule, kernel memory starvation should not result in system calls that could be deferred instead returning errors, although per Rick's thread a while back on this list it is a nontrivial thing to get the right behaviour in the face of the heavy layering within the kernel. What I'm saying is that the miscompares you're seeing may well be a tool bug, but the filesystem operations returning ENOMEM feels to me like a kernel bug, and you should file accordingly. There are ~470 place, not including manual pages, where ENOMEM occurs in the kernel. There are ~300 places not involving networking, and ~60 in VFS alone. These are event sites, which means that if they are in support functions, they could each be multiplied by the functions that call them and propagate errors up, for the call graph depth. This works out to about 250 (after call amplification) places he could legitimately end up with ENOMEM because he is, in fact, out of memory (attempting to allocate from an exhaustible non-expandable zone, etc.). Michael, you need to tell us where you're getting ENOMEM from. The easiest way would be to pick the system call in user space that's giving you the ENOEM, then use DTRACE to watch it with a speculative probe, and when you get the error back at the system call return point, print out the call graph that got you the error. This will tell you where and why the ENOMEM failure is happening, which should lead you to the root cause. Actually, knowing where the error is from will probably be enough to let you actually fix the problem, without involving anyone else, I suspect. This email sent to site_archiver@lists.apple.com