RE: Thread locks in 10.3
RE: Thread locks in 10.3
- Subject: RE: Thread locks in 10.3
- From: Quinn <email@hidden>
- Date: Thu, 19 Oct 2006 11:42:05 +0100
Title: RE: Thread locks in 10.3
At 13:53 -0400 18/10/06, Carl Smith wrote:
I
guess in retrospect I could some direction where to read up on the
10.4 locking design. If anyone knows where I could find this document
and or some examples I would appreciate it.
One key thing to remember is that the Mac OS X locking design
varies by kernel subsystem.
o Mach has always had fine grained locking
o BSD (specifically the file system and networking subsystems)
used funnels until Mac OS X 10.4, whereupon they switched to fine
grained locking
o I/O Kit locking is based on the workloop model
However, as you're developing an NKE, I'll focus on the BSD
subsystem, and I'll start with the 10.4 locking design because that's
the most important one. In 10.4 the BSD part of the kernel
switched to a fine grained locking model based on locks, reference
counts, and condition variables. This is a pretty standard
architecture for a modern UNIX-style kernel; if you poke around on the
net, you should find plenty of information about it.
The locks and the reference counts go hand-in-hand. Each
BSD subsystem uses locks to protect its own data structures. For
example, the socket layer has a per-socket lock associated with
(struct socket) [1] to protect the (relevant) fields of that
structure. However, each subsystem drops any locks that it holds
before it calls out to another subsystem. The goal here is to
prevent deadlocks.
If you take a lock and call out to arbitrary code, it's very easy
to deadlock. For example, if you hold a socket lock and then
allocate memory, the memory subsystem might try to free up memory by
pushing a page out to the backing store; if you're netbooted, that
might require the very socket lock that you're holding. This is
a contrived example (deadlocks like this are typically much more
complex :-), but you should get the idea.
However, by dropping the lock on an object you run the risk of
the object being modified (possibly even destroy) while you off
running code in another subsystem. There are two common
solutions to this sort of problem.
o restart -- In this pattern you assume that any information that
you had before dropping the lock is invalid, and you rebuild it all
after reacquiring the lock. For example, to probe a cache you
might use:
lock cache
loop
exit if object in cache
if previously allocated placeholder
install placeholder into
cache
exit
end if
unlock cache
allocate memory for object placeholder
lock cache
end loop
touch object
unlock cache
if allocated memory for placeholder but didn't install it
into cache
deallocate placeholder
end if
The key point is that, after the unlock/alloc/lock sequence, you
don't know if some other thread placed the object the cache, and thus
you should probe the cache again before blindly inserting your newly
allocated object [2].
o reference count -- In this pattern you increment a reference
count on the object before dropping the lock. This ensures that
the object can't go away while you're off running code in another
subsystem.
In many cases you have to combine these approaches; you use the
reference count to ensure that the object doesn't go away /and/ you
reconfirm its state after reacquiring your lock. You can also
combine these approaches with the use of conditional variables to hold
off threads that are doing incompatible operations.
In this context, each KEXT is considered a separate subsystem.
Thus, calling a KEXT is considered calling out to a separate
subsystem. The general rule for the BSD parts of the kernel in
10.4 and later is that, when the kernel calls your KEXT, it will hold
no locks [3]. However, if you the kernel passes you an object as
an input parameter, it guarantees that the object won't go away until
you return (using either a reference count or some other mechanism
that isn't your concern).
The discussion above is about how the kernel protects its own
data structures, and how it can pass them to you safely. Your
data structures are your own concern. You can use whatever
technique you like protect those structures. However, it makes
sense to follow the model laid down by the kernel. One thing
that we /strongly/ recommend is that you drop any locks that you hold
before calling out to other kernel subsystems. Ignore this
advice at your peril!
*
* *
To see an example of this, check out the ReinjectDeferredData
code in the "tcplognke" sample.
<http://developer.apple.com/samplecode/tcplognke/listing3.html>
The goal of this routine is to take any packets that were
deferred and reinject them into the stack. A naive
implementation might look like:
lock queue
for each element in the queue
pull the element off the queue
reinject the packet
end for
unlock queue
However, this leaves you calling out to other kernel subsystems
(when you reinject) with a lock held. So the code in the sample
does the following:
lock queue
for each element in the queue
pull the element off the queue
add it to a local queue
end for
unlock queue
for each element in the local queue
reinject the packet
end for
Note that the "tcplognke" example breaks the 'unlock
before calling out' rule when it allocates memory. That's
something we hope to get fixed in the next version of the sample
[4].
*
* *
Finally, there's the issue of condition variables. One key
thing to note about locks (lck_mtx_t) is that they can only be used to
protect the consistency of a data structure; you can't use them to
monitor the state of the data structure. This is because the
lock /must/ be unlocked by the same thread that locked it. To
monitor the state of data structure, the kernel implements a mechanism
very similar to pthread condition variables. The relevant
routines, msleep and wakeup, are documented on our web site.
However, I wrote up a long explanation of this for another developer,
and I've pasted it in below. [Actually, the list serve barfed on
my mega email, so I'll send it in a separate message.]
*
* *
This leaves you with the question of 10.3. The first, easy,
answer is "don't support 10.3.x". Seriously, it's a
pain, and it's probably pain that you can do without. Have you
seen the following:
<http://update.omnigroup.com/>
If that doesn't work for you, you have to learn about an entirely
new (well, old) locking model. The basics of that model are
described in the "Mac OS X Kernel Threading" section of DTS
Technote 2028 "Mac OS Threading Models".
<http://developer.apple.com/technotes/tn/tn2028.html>
Once you've grocked the idea of funnels, you should be able to
understand their consequences.
o Locks (as I've described them above) are not nearly as useful
on a funnel-based kernel because they don't automatically drop when
you block. Thus, if you use locking, you have to be even more
careful about following the 'unlock before calling out' rule.
o Another thing that makes locks less useful is that there's no
equivalent of msleep. The older routines (tsleep etc) don't let
you atomically drop a mutex and block. To achieve the
equivalent, you have to rely on the funnel. That is, make sure
you don't do any blocking operation (or call out to any subsystem that
might do so) between the test of your condition and the call to
tsleep.
o There is no systematic reference counting of BSD kernel
objects. Thus, when you call out to some other subsystem, it's
hard to know whether the object pointers that you have are still valid
when you return. The kernel tries to workaround this with a
system of locks, but it's all very ah hoc and confusing.
*
* *
*phew* This took longer than I expected. Now's the
time to say "I now have to go back to my real job", except
that my real job /is/ helping developers (-:
Share and Enjoy
--
Quinn "The
Eskimo!"
<http://www.apple.com/developer/>
Apple Developer Relations, Developer Technical Support, Core
OS/Hardware
[1] This is, of course, a simplification. Actually, the
protocol gets to decide as to how socket locking is done. If the
protocol has a pr_lock callback, the socket layer will call that to
lock the socket. Otherwise, the socket layer will use a global
(per-domain) lock. The former is used by performance critical
protocols, like TCP, while the latter is used by non-peformance
critical stuff.
This is a classic example of how KPI design described in this
post insulates your code from the details of kernel locking. You
can write a socket filter NKE without caring whether the underlying
protocol uses per-socket or per-domain locking. Also, this could
change from release-to-release of the system and your NKE would Just
Work (tm).
[2] You could also pre-allocate the memory:
allocate memory for object placeholder
lock cache
if object not in cache
install placeholder into cache
end if
touch object
unlock cache
if allocated memory for placeholder but didn't install it
into cache
deallocate placeholder
end if
but that involves allocation memory on the cache hit path, which
is probably a bad idea (-:
[3] This is yet another gross simplification. It's very
likely that the kernel will be holding some locks when it calls your
KEXT. [And when I say "holding", I mean that one of
the functions in your backtrace is holding a lock that can't be
released until you return.] However, the goal is for the kernel
not to hold any locks that you would need in the normal operation of
your KEXT. This set can change over time. Remember, the
kernel's locking of its own data structures is its concern.
Alas, the kernel is not provably deadlock free; it's all a
question of statistics. This is one of the reasons that bouncing
out to user space in unexpected circumstances is such a risk. As
the kernel's lock usage changes over time, it can break the
statistical 'guarantees' that you've established via testing.
[4] It's all my fault I'm afraid; I reviewed the sample in depth
and missed that problem.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden