Re: Yielding the processor in a kext?

7 Sep 2007

      site_archiver@lists.apple.com
Delivered-To: darwin-kernel@lists.apple.com

I understand that by default threads executing in the kernel are
pre-emptable, but sometimes I would like the current thread to
explicitly give the scheduler a chance to schedule another thread,
before the current thread's timeslice has elapsed. I'm looking for
something similar to cond_resched() on Linux.

You're right, it is.  You should, instead, either:

I have been told I should only be using OSMalloc()...
Note also that you confused my original point with a different one.
Best regards,
	Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a...
On 7 Sep 2007, at 17:53, Michael Smith wrote:

On Sep 7, 2007, at 1:08 AM, Anton Altaparmakov wrote:

On 7 Sep 2007, at 03:55, Michael Smith wrote:

On Sep 6, 2007, at 5:03 PM, Régis Duchesne wrote:

Is there a way to make the current thread yield the processor in
a kext?

Not in the fashion you describe.  If you have work, do it.  If
the scheduler has more important work that has to be done, it
will take the CPU away from your thread.

But for example from a file system kext we don't know what
priority the thread is and we don't own the thread either so we
should not be messing with priorities...

On the contrary.  As a filesystem, if you need to perform a lengthy
task, the onus is on you to ensure that you do it on a thread with
an appropriate priority.  If this means kicking off another thread,
so be it.

So I am going to spawn a kernel thread at a lower priority and block
in the parent thread until that thread completes?!?  That makes no
sense whatsoever (at least to me)...
And what if we have work to do but can't do it because we need
another thread to do work first?  It is stupid to hang onto our
time slice in at this point...

If you have work to do that you can't do because some other thread
has work to do, you must necessarily have a synchronisation object
that is shared between you and the other thread.  Thus, you will
block on said synchronisation object.  This is a truly fundamental
threaded programming metaphor, and the only way to avoid priority
inversion.
If, as you are implicitly suggesting, a high priority consumer
thread were to spin polling and yielding, there is no guarantee
that the lower priority worker thread would ever get the task done.

The only reason to want to do this is if you are performing a
long-running low priority task on the wrong (high-priority) thread.
Don't do that.  Do the work on a thread whose priority reflects
the true priority of the task.  It's not your business to make
scheduling decisions; don't even try.

Your view is too limited.  This is not true at all.  I will give
you a concrete example of when I would like to yield the CPU from
my file system kext:
I need to take a lock but I can't do it safely because of holding
other locks so I do a try-lock.  The try-lock fails so I have to
drop my locks and try again.  It is totally brain damaged to try
again immediately.  It would be far more sensible to yield the CPU
to give the other process holding my lock to finish its work and
release it and when the scheduler next gets to my process for me
to try again and if I fail again to yield again, etc...  And
trying again immediately is actually MUCH worse than the fact that
CPU is just being burned unnecessarily because the other process
might be trying to take the lock I just dropped so if I take it
again immediately the chances of the other process taking that
lock successfully and hence making progress thus releasing both
locks so my process can make progress are close to zero at this
point!  )-:

I would argue that you're making several erroneous assumptions
here.  The first is that it's a good idea for you to ever hold more
than one lock in kernel code.  The second is that you should permit
the acquisition of locks in arbitrary order.  Both are bad
programming practice; the latter is in fact *awful* practice.

Awful programming practice or not we don't get a choice in the matter
sometimes!  If the OS X VFS wasn't the way it is, in an ideal world
you are of course right.  Unfortunately we don't live in an ideal
world...
When the VFS/VM get a UPL (those are as you know exclusive access)
and then call the file system VNOP_PAGEIN() to fill that UPL with
data from disk the FS kext _MUST_ take file system locks _after_ the
UPL is taken (e.g. to exclude concurrent truncates).  This is a
_MUST_.  No measure of programming practice can ever change that!
And not using more than one lock is a very silly thing to suggest.
Perhaps you would like the one-funnel-for-whole-system world back and
send OS X back to the stone age allowing one CPU core to run at a
time and getting rid of things like pre-emption altogether?  If so be
my guest.  Writing for uniprocessor, no preemption is indeed very
easy.  You don't need locking at all!  (-;
And when the file system has taken some locks say in VNOP_READ()
after a read() system call (e.g. to exclude truncates) it may need to
create a sequence of UPLs thus suddenly you have lock inversion
between the VM/VFS and the FS kext.  Neither can do anything about
it...  Especially as the FS kext usually just calls cluster_read_ext
() thus has no control at all of the creation of the UPLs after the
lock is taken.
[I am cheating in my example above as the lock taken by those code
paths at least for my FS kext is taken shared thus there is no
deadlock as both threads take the lock at the same time without
deadlocking but it was a simple example illustrating how sometimes
lock inversions happen because of antiquated interfaces that were
designed in a uniprocessor, no preemption, one funnel for all world
and not because of people's inability to write proper code...]
Thirdly, you are assuming that the thread holding the lock(s) you
want will actually run as a consequence of your yielding; implying
that "yield" has semantics beyond "terminate my quantum early".
And another example: In a critical path when a memory allocation
fails, I want to yield the CPU and try again later when the system
hopefully has more memory available as failing the allocation is a
disaster for file system consistency at this point.  At the moment
I just do it in a busy loop trying again and again and again till
the scheduler decides to reschedule me.  That's got to be totally
insane...  )-:

a) Fail the operation with a resource shortage error.

Can't do that without leaving the FS in severe state of corruption.
Keeping state and rolling back is too much effort (resources wise it
would slow down things a lot) to be worth it for such an uncommon case.
b) Use one of the memory allocation primitives that knows how to
wait intelligently for memory to become available.

I would very much like to know how I can tell the scheduler "I am
done, hand CPU to someone else now"...

Tell it under what criteria you will be able to run again; block on
a synchronisation primitive, or take advantage of infrastructure
that will do so on your behalf.
There are two reasons that people think that yield() might be a
good idea in kernel code.
1) When a long-running task is being performed at an inappropriate
priority, and it might be "nice" to let other threads have the CPU.
	In this case, I argue that the task should simply be moved to
another thread that runs at an appropriate priority.
2) When there is a priority inversion (thread, lock, etc.) and the
caller thinks that a lower-priority thread needs to run in order
for the current thread to make progress.
	In this case, I argue that polling (what use of yield devolves to)
is inferior to well-designed code and use of blocking primitives.

Care to rewrite the VFS/VM <-> FS kext interface and fix up all file
systems?  No?  Me, neither...
This email sent to site_archiver@lists.apple.com

Anton Altaparmakov

tags

participants (1)