Re: Yielding the processor in a kext?
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com You're right, it is. You should, instead, either: Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... On 11 Sep 2007, at 10:28, Postmaster wrote: On 7 Sep 2007, at 22:14, Anton Altaparmakov wrote: a) Fail the operation with a resource shortage error. Can't do that without leaving the FS in severe state of corruption. Keeping state and rolling back is too much effort (resources wise it would slow down things a lot) to be worth it for such an uncommon case. So fix your FS design; don't cite it as an excuse for writing bad code... This discussion is becoming pointless, sorry. It has nothing to do with my FS design. I think he means the design of the code that implements your file system, not the file system design itself. Your code should be designed in such a way that a memory allocation error caused by a memory shortage should not corrupt it. That is pretty impossible without allocating tons of memory "just in case" before every operation which would slow things down and be generally a silly thing to do. And really it is impossible to do because the allocations may be happening elsewhere in the kernel as indirect consequences of what my code is doing. Also the code operates directly on the metadata so if a memory allocation fails deeply enough in the code the previous state has been long lost so it cannot be restored and even worse often I modify some metadata, then have to release the VM page holding the metadata to avoid potential deadlocks whilst doing something else and then have to re-get the VM page to finish up the operation. If a get an ENOMEM at this point (the VM perhaps page out the page holding my metadata and now has to page it back in) I cannot get back at the metadata and thus can neither complete nor roll back the operation even if I knew the previous state. This would not be too bad in itself because I always leave metadata records in self-consistent state. The problem is that the FS has metadata records all over the place in different system files / directory indexes, other indexes, etc that all need to be consistent with each other. And here comes the problem. I update one piece of metadata and it succeeds, then I try to do the second one and that fails with ENOMEM. I then try and roll back the first change and get another ENOMEM. And now the metadata is screwed because you have records out of sync with each other and nothing you can do about it other than tell the user they have a corrupt file system and they need to unmount and check their disk or in some cases you can keep retrying the memory allocation in the hope that you can proceed (either forward or back it does not matter as long you are not left in the middle!) You try and put a relational database in the kernel as file system and put in lots of metadata duplication and cross references all over the place and then try and keep it both fast in the 99.9999% and 100% correct in the 0.0001% where a memory shortage occurs without journalling, COW, and other modern approaches to fs consistency and then you will be in my boat and see how retrying memory allocations suddenly seems like a great idea... (-; Of course the other problem is that the ENOMEM is only one of many failure modes. It could be as you say the user pulled the plug or a power outage happened or a bad sector developed or the computer's memory got corrupted, etc. And you end up leaving a corrupt FS in all those situations too. And those other cases happen a LOT more often than the odd case where the system runs out of memory so it is silly to optimize for the least common failure mode when there are so many other failure modes that are far more lethal and you have no control over at all... In 25 years of using computers I have never seen a kernel task fail with ENOMEM whilst I have lost count of how many times one of my children has pressed the off button on my external drive (which nutcase decided to make the button light up when it is on so children are attracted to pressing the light like mosquitoes?!?) or has pressed the nice and shiny round button on my MacBook Pro (and as I use it for development that is an NMI thus the machine dies immediately and I can only get it back if I plug in a second machine, enter gdb and continue!) so if my driver causes corruption less than once in 25 years of daily use I really don't mind putting up with having to run a filesystem check afterward as it fades into insignificance compared to the number of times I have had to run filesystem checks for other reasons. And your proposed solution, which is to make the thread hang around until some memory becomes available, is not really a solution at all. Your file system is in a corrupt state all the time while the thread is waiting and the wait might not end in a successful memory allocation, but in sudden termination when the bored user switches off the computer which is probably highly unresponsive if there is no kernel memory available. If they do that then they will get corruption no matter what. We do not implement journalling so pulling the plug whilst the file system is mounted guarantees that you have to run a check of the volume. I mark the FS dirty at mount time and only on clean unmount is it marked clean again so if such an event does not happen because the user rebooted by force / power failed, etc on next mount the user will be warned that the volume is dirty and that they should check the disk... And one day hopefully trying to mount will automatically run a check of the disk if the volume is dirty but that is still a long way off as we don't have a file system checker yet... This email sent to site_archiver@lists.apple.com
participants (1)
-
Anton Altaparmakov