Re: Yielding the processor in a kext?

11 Sep 2007

      site_archiver@lists.apple.com
Delivered-To: darwin-kernel@lists.apple.com

You're right, it is.  You should, instead, either:
Best regards,
	Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a...
On 11 Sep 2007, at 10:28, Postmaster wrote:

On 7 Sep 2007, at 22:14, Anton Altaparmakov wrote:

a) Fail the operation with a resource shortage error.

Can't do that without leaving the FS in severe state of
corruption.  Keeping state and rolling back is too much effort
(resources wise it would slow down things a lot) to be worth it
for such an uncommon case.

So fix your FS design; don't cite it as an excuse for writing bad
code...

This discussion is becoming pointless, sorry.  It has nothing to
do with my FS design.

I think he means the design of the code that implements your file
system, not the file system design itself.  Your code should be
designed in such a way that a memory allocation error caused by a
memory shortage should not corrupt it.

That is pretty impossible without allocating tons of memory "just in
case" before every operation which would slow things down and be
generally a silly thing to do.  And really it is impossible to do
because the allocations may be happening elsewhere in the kernel as
indirect consequences of what my code is doing.  Also the code
operates directly on the metadata so if a memory allocation fails
deeply enough in the code the previous state has been long lost so it
cannot be restored and even worse often I modify some metadata, then
have to release the VM page holding the metadata to avoid potential
deadlocks whilst doing something else and then have to re-get the VM
page to finish up the operation.  If a get an ENOMEM at this point
(the VM perhaps page out the page holding my metadata and now has to
page it back in) I cannot get back at the metadata and thus can
neither complete nor roll back the operation even if I knew the
previous state.  This would not be too bad in itself because I always
leave metadata records in self-consistent state.  The problem is that
the FS has metadata records all over the place in different system
files / directory indexes, other indexes, etc that all need to be
consistent with each other.  And here comes the problem.  I update
one piece of metadata and it succeeds, then I try to do the second
one and that fails with ENOMEM.  I then try and roll back the first
change and get another ENOMEM.  And now the metadata is screwed
because you have records out of sync with each other and nothing you
can do about it other than tell the user they have a corrupt file
system and they need to unmount and check their disk or in some cases
you can keep retrying the memory allocation in the hope that you can
proceed (either forward or back it does not matter as long you are
not left in the middle!)
You try and put a relational database in the kernel as file system
and put in lots of metadata duplication and cross references all over
the place and then try and keep it both fast in the 99.9999% and 100%
correct in the 0.0001% where a memory shortage occurs without
journalling, COW, and other modern approaches to fs consistency and
then you will be in my boat and see how retrying memory allocations
suddenly seems like a great idea...  (-;
Of course the other problem is that the ENOMEM is only one of many
failure modes.  It could be as you say the user pulled the plug or a
power outage happened or a bad sector developed or the computer's
memory got corrupted, etc.  And you end up leaving a corrupt FS in
all those situations too.  And those other cases happen a LOT more
often than the odd case where the system runs out of memory so it is
silly to optimize for the least common failure mode when there are so
many other failure modes that are far more lethal and you have no
control over at all...
In 25 years of using computers I have never seen a kernel task fail
with ENOMEM whilst I have lost count of how many times one of my
children has pressed the off button on my external drive (which
nutcase decided to make the button light up when it is on so children
are attracted to pressing the light like mosquitoes?!?) or has
pressed the nice and shiny round button on my MacBook Pro (and as I
use it for development that is an NMI thus the machine dies
immediately and I can only get it back if I plug in a second machine,
enter gdb and continue!) so if my driver causes corruption less than
once in 25 years of daily use I really don't mind putting up with
having to run a filesystem check afterward as it fades into
insignificance compared to the number of times I have had to run
filesystem checks for other reasons.
And your proposed solution, which is to make the thread hang around
until some memory becomes available, is not really a solution at
all.  Your file system is in a corrupt state all the time while the
thread is waiting and the wait might not end in a successful memory
allocation, but in sudden termination when the bored user switches
off the computer which is probably highly unresponsive if there is
no kernel memory available.

If they do that then they will get corruption no matter what.  We do
not implement journalling so pulling the plug whilst the file system
is mounted guarantees that you have to run a check of the volume.  I
mark the FS dirty at mount time and only on clean unmount is it
marked clean again so if such an event does not happen because the
user rebooted by force / power failed, etc on next mount the user
will be warned that the volume is dirty and that they should check
the disk...  And one day hopefully trying to mount will automatically
run a check of the disk if the volume is dirty but that is still a
long way off as we don't have a file system checker yet...
This email sent to site_archiver@lists.apple.com

Anton Altaparmakov

tags

participants (1)