Re: How to debug VM trouble?
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=07JwfhnH74R+7zLzvnk2p2CLUT8hpOrgi+IW/sT45f4=; b=vRihqueIDaj9e+dox7UhKBCAs/TaXwvvNCRuzNy1+Wq2d1/0XQ69hAMH2uHnKusH8Yryv4LFsDQ8ZTZzs/HeIE1MJml4ewtaRSGu3RhKYaqlSs3bRp5R+NoRJ72eTGqJHNdEc2t42AbPM1ITyOBsbJEloBjilvCEakfYWa9euUM= Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=rlLmHbasOi4Nl6wAEX0fPPxYzKX1l9ylpWdrdMxtPxAunpljWlnhp0YJAeLHQp1KKhZg6IRg2QTvDMGmexR2QTmJMXFOPT0fiWvF9KfvT5XNh7VOBhqH0pWQBhbF/+7dmzKZYJXMVm3Fsq4MEvQ4RKJsqEfvr8jTShSkLLpkOKs= We are now thinking that the error we're seeing with our disk test tool may actually be a bug in the tool. One of my coworkers is now inspecting its source to see if he can find the problem. We have a tool to test our RAID controllers that spawns some threads that each write a file, then reads it back to check that what it gets back is what it wrote. If we spawn a lot of threads, each writing very large files using large memory buffers, and let the tool run for a day or two, eventually there will be a miscompare. This doesn't happen with smaller files, fewer threads, or with more memory installed in the Mac. But when it does, there will also be a bunch of messages in the system.log from other programs such as daemons, complaining for example of getting ENOMEM from write() system calls. Our theory is that we're running the kernel out of memory, so that it eventually returns errors to our own system calls, but we don't check the error code properly. The data is then either not read or not written because the call failed. While I do have to let the tool run for a long time, I am also able to stimulate the bug using drives hooked to the motherboard SATA controller in my dual G5, without our PCI card or device driver being involved. Unfortunately, our test tool has grown organically as it was ported from platform to platform over the years - so it's not actually easy to spot any problems. But such legacy code is likely the source of our trouble. We were suspecting for a while that the bug was in the OS X kernel. We're going to do whatever it takes to find the real source of trouble. If it turns out that we later conclude that it is in fact in Apple's code, I'll be sure to file a bug report and post the number to this thread. Thanks for all your advice. Mike Crawford mdcrawford at gmail dot com http://www.geometricvisions.com/ <-- Creative Commons Music Downloads _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... This email sent to site_archiver@lists.apple.com
participants (1)
-
Michael Crawford