Open Menu Close Menu
Apple
Shopping Bag

Apple
Mac
iPad
iPhone
Watch
TV
Music
Support
Shopping Bag

Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

RE: Thread locks in 10.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Thread locks in 10.3

Subject: RE: Thread locks in 10.3
From: Quinn <email@hidden>
Date: Thu, 19 Oct 2006 11:42:05 +0100

Title: RE: Thread locks in 10.3

At 13:53 -0400 18/10/06, Carl Smith wrote:

I guess in retrospect I could some direction where to read up on the 10.4 locking design. If anyone knows where I could find this document and or some examples I would appreciate it.

One key thing to remember is that the Mac OS X locking design varies by kernel subsystem.

o Mach has always had fine grained locking

o BSD (specifically the file system and networking subsystems) used funnels until Mac OS X 10.4, whereupon they switched to fine grained locking

o I/O Kit locking is based on the workloop model

However, as you're developing an NKE, I'll focus on the BSD subsystem, and I'll start with the 10.4 locking design because that's the most important one. In 10.4 the BSD part of the kernel switched to a fine grained locking model based on locks, reference counts, and condition variables. This is a pretty standard architecture for a modern UNIX-style kernel; if you poke around on the net, you should find plenty of information about it.

The locks and the reference counts go hand-in-hand. Each BSD subsystem uses locks to protect its own data structures. For example, the socket layer has a per-socket lock associated with (struct socket) [1] to protect the (relevant) fields of that structure. However, each subsystem drops any locks that it holds before it calls out to another subsystem. The goal here is to prevent deadlocks.

If you take a lock and call out to arbitrary code, it's very easy to deadlock. For example, if you hold a socket lock and then allocate memory, the memory subsystem might try to free up memory by pushing a page out to the backing store; if you're netbooted, that might require the very socket lock that you're holding. This is a contrived example (deadlocks like this are typically much more complex :-), but you should get the idea.

However, by dropping the lock on an object you run the risk of the object being modified (possibly even destroy) while you off running code in another subsystem. There are two common solutions to this sort of problem.

o restart -- In this pattern you assume that any information that you had before dropping the lock is invalid, and you rebuild it all after reacquiring the lock. For example, to probe a cache you might use:

lock cache

loop

exit if object in cache

if previously allocated placeholder

install placeholder into cache

exit

end if

unlock cache

allocate memory for object placeholder

lock cache

end loop

touch object

unlock cache

if allocated memory for placeholder but didn't install it into cache

deallocate placeholder

end if

The key point is that, after the unlock/alloc/lock sequence, you don't know if some other thread placed the object the cache, and thus you should probe the cache again before blindly inserting your newly allocated object [2].

o reference count -- In this pattern you increment a reference count on the object before dropping the lock. This ensures that the object can't go away while you're off running code in another subsystem.

In many cases you have to combine these approaches; you use the reference count to ensure that the object doesn't go away /and/ you reconfirm its state after reacquiring your lock. You can also combine these approaches with the use of conditional variables to hold off threads that are doing incompatible operations.

In this context, each KEXT is considered a separate subsystem. Thus, calling a KEXT is considered calling out to a separate subsystem. The general rule for the BSD parts of the kernel in 10.4 and later is that, when the kernel calls your KEXT, it will hold no locks [3]. However, if you the kernel passes you an object as an input parameter, it guarantees that the object won't go away until you return (using either a reference count or some other mechanism that isn't your concern).

The discussion above is about how the kernel protects its own data structures, and how it can pass them to you safely. Your data structures are your own concern. You can use whatever technique you like protect those structures. However, it makes sense to follow the model laid down by the kernel. One thing that we /strongly/ recommend is that you drop any locks that you hold before calling out to other kernel subsystems. Ignore this advice at your peril!

* * *

To see an example of this, check out the ReinjectDeferredData code in the "tcplognke" sample.

<http://developer.apple.com/samplecode/tcplognke/listing3.html>

The goal of this routine is to take any packets that were deferred and reinject them into the stack. A naive implementation might look like:

lock queue

for each element in the queue

pull the element off the queue

reinject the packet

end for

unlock queue

However, this leaves you calling out to other kernel subsystems (when you reinject) with a lock held. So the code in the sample does the following:

lock queue

for each element in the queue

pull the element off the queue

add it to a local queue

end for

unlock queue

for each element in the local queue

reinject the packet

end for

Note that the "tcplognke" example breaks the 'unlock before calling out' rule when it allocates memory. That's something we hope to get fixed in the next version of the sample [4].

* * *

Finally, there's the issue of condition variables. One key thing to note about locks (lck_mtx_t) is that they can only be used to protect the consistency of a data structure; you can't use them to monitor the state of the data structure. This is because the lock /must/ be unlocked by the same thread that locked it. To monitor the state of data structure, the kernel implements a mechanism very similar to pthread condition variables. The relevant routines, msleep and wakeup, are documented on our web site. However, I wrote up a long explanation of this for another developer, and I've pasted it in below. [Actually, the list serve barfed on my mega email, so I'll send it in a separate message.]

* * *

This leaves you with the question of 10.3. The first, easy, answer is "don't support 10.3.x". Seriously, it's a pain, and it's probably pain that you can do without. Have you seen the following:

<http://update.omnigroup.com/>

If that doesn't work for you, you have to learn about an entirely new (well, old) locking model. The basics of that model are described in the "Mac OS X Kernel Threading" section of DTS Technote 2028 "Mac OS Threading Models".

<http://developer.apple.com/technotes/tn/tn2028.html>

Once you've grocked the idea of funnels, you should be able to understand their consequences.

o Locks (as I've described them above) are not nearly as useful on a funnel-based kernel because they don't automatically drop when you block. Thus, if you use locking, you have to be even more careful about following the 'unlock before calling out' rule.

o Another thing that makes locks less useful is that there's no equivalent of msleep. The older routines (tsleep etc) don't let you atomically drop a mutex and block. To achieve the equivalent, you have to rely on the funnel. That is, make sure you don't do any blocking operation (or call out to any subsystem that might do so) between the test of your condition and the call to tsleep.

o There is no systematic reference counting of BSD kernel objects. Thus, when you call out to some other subsystem, it's hard to know whether the object pointers that you have are still valid when you return. The kernel tries to workaround this with a system of locks, but it's all very ah hoc and confusing.

* * *

*phew* This took longer than I expected. Now's the time to say "I now have to go back to my real job", except that my real job /is/ helping developers (-:

Share and Enjoy

--

Quinn "The Eskimo!" <http://www.apple.com/developer/>

Apple Developer Relations, Developer Technical Support, Core OS/Hardware

[1] This is, of course, a simplification. Actually, the protocol gets to decide as to how socket locking is done. If the protocol has a pr_lock callback, the socket layer will call that to lock the socket. Otherwise, the socket layer will use a global (per-domain) lock. The former is used by performance critical protocols, like TCP, while the latter is used by non-peformance critical stuff.

This is a classic example of how KPI design described in this post insulates your code from the details of kernel locking. You can write a socket filter NKE without caring whether the underlying protocol uses per-socket or per-domain locking. Also, this could change from release-to-release of the system and your NKE would Just Work (tm).

[2] You could also pre-allocate the memory:

allocate memory for object placeholder

lock cache

if object not in cache

install placeholder into cache

end if

touch object

unlock cache

if allocated memory for placeholder but didn't install it into cache

deallocate placeholder

end if

but that involves allocation memory on the cache hit path, which is probably a bad idea (-:

[3] This is yet another gross simplification. It's very likely that the kernel will be holding some locks when it calls your KEXT. [And when I say "holding", I mean that one of the functions in your backtrace is holding a lock that can't be released until you return.] However, the goal is for the kernel not to hold any locks that you would need in the normal operation of your KEXT. This set can change over time. Remember, the kernel's locking of its own data structures is its concern.

Alas, the kernel is not provably deadlock free; it's all a question of statistics. This is one of the reasons that bouncing out to user space in unexpected circumstances is such a risk. As the kernel's lock usage changes over time, it can break the statistical 'guarantees' that you've established via testing.

[4] It's all my fault I'm afraid; I reviewed the sample in depth and missed that problem.

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: Thread locks in 10.3
  - From: Terry Lambert <email@hidden>

Prev by Date: RE: Filesystem bundles
Next by Date: RE: Thread locks in 10.3
Previous by thread: Re: Thread locks in 10.3
Next by thread: Re: Thread locks in 10.3
Index(es):
- Date
- Thread