Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: waiting queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: waiting queue

Subject: Re: waiting queue
From: Quinn <email@hidden>
Date: Wed, 6 Dec 2006 09:37:01 +0000

At 11:11 -0800 5/12/06, Michael Smith wrote:

As a general rule, if you believe that your hardware is reliable, use THREAD_UNINT to be safe.

There's one gotcha here. Because of the way user clients are implementation, you shouldn't use THREAD_UNINT while blocking in a user client if that user client has wired down any memory from the client process. If you do so, there will be no way to kill the task. I've included the gory details below. This was on 10.3.x, so YMMV on later systems.

S+E
--
Quinn "The Eskimo!"                    <http://www.apple.com/developer/>
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

If you look at task_terminate_internal, you'll see that midway through its implementation it calls thread_terminate_internal for each thread in the task. This doesn't actually terminate the threads. Rather, it schedules an AST for the thread. The AST is like a secondary interrupt. The next time the thread leaves the kernel, it will run the AST handler before it leaves. This includes two special cases:

o If the thread is currently running on another CPU, it will send a inter-CPU interrupt to force the AST to happen promptly.

o If the thread is blocked, it will be woken up so that it can leave the kernel and run the AST.

Remember that the thread executing task_terminate_internal could be part of the task, and thus is subject to this AST scheduling. However, because that thread is still running inside the kernel, scheduling an AST on the thread has no immediate impact. The AST only runs as the thread leaves the kernel.

task_terminate_internal continues to run and eventually calls ipc_space_destroy and vm_map_remove. Ultimately task_terminate_internal returns and the thread winds its way out of the kernel. As it leaves the kernel the thread executes any pending ASTs. If this thread is part of the terminated task, it will execute the AST that was scheduled when task_terminate_internal called thread_terminate_internal.

The AST handler for task termination does the key work via a call to thread_terminate_self (osfmk/kern/thread.c). So, as task_terminate_internal has been doing its thread, other threads in the task have been running, entering the kernel, and executing the thread_terminate_self routine as they leave.

thread_terminate_self decrements the count of the number of running threads in the task and, if it hits 0, calls BSD (the proc_exit routine) to clean up BSD constructs associated with the task. It then goes on to clean up various aspects of the thread, mark the thread as terminated (by setting TH_TERMINATE), and then blocks the thread (by calling thread_block). When the thread switcher switches away from a terminated thread, it schedules the thread to be reaped (thread_reaper_enqueue) which, in turn, wakes up the reaper thread (reaper_thread_continue) which finally disposes of the thread's data structures.

Given the above background, it's now possible to show how your code gets into trouble, and why moving away from THREAD_UNINT fixes it.
Here's the sequence of events when you use THREAD_UNINT.
1. The blocking thread (thread A) enters your user client. As it enters the kernel it increments the number of send rights for the user client's port.
2. Thread A now blocks uninterruptibly using IOCommandGate::commandSleep.
3. Much time passes.
4. One of other the threads in the task (thread B) decides to that it's time to die (it's the target of the force quit signal, or it calls "exit", or it bus errors, or whatever). This results in a call to task_terminate_internal.

5. task_terminate_internal calls thread_terminate_internal for each thread in the task, which schedules an AST for those threads, including threads A and B. Thread A is blocked uninterruptably, and so it does nothing in response to this AST request. Thread B is still inside the kernel (running task_terminate_internal itself) and so does not respond immediately to the AST.

6. Thread B, still running task_terminate_internal, now calls ipc_space_destroy. This destroys all of the rights for the task. However, the extra send right added in step 1 prevents your user client's port's send right count going to 0, thus a "no more senders" notification isn't given, thus your ::clientDied method is not called.

7. Thread B now calls vm_map_remove. This ends up blocking because one of the VM map entries is wired by your driver.

The task is now deadlocked. Thread B is blocked waiting for your driver to unwire the memory. However, that won't happen until ::clientDied is called. However, that can't happen because thread A is holding a send right for your user client's port, and thread A is blocked uninterruptibly.

Now let's look at what happens if thread A blocks interruptibly. Everything proceeds as above until step 5. At that point things diverge with the following sequence.

5. task_terminate_internal calls thread_terminate_internal for each thread in the task, which schedules an AST for those threads, including threads A and B. As before, thread B is still inside the kernel (running task_terminate_internal itself) and so does not respond immediately to the AST. However, thread A is blocked in IOCommandGate::commandSleep and the AST causes it to unblock. It will eventually return from IOCommandGate::commandSleep with a THREAD_INTERRUPTED error.
6. As above.
7. As above (thread B blocks in vm_map_remove).
8. At this point thread A is scheduled. It returns from IOCommandGate::commandSleep with a THREAD_INTERRUPTED error. Eventually this causes the thread to leave the user client and wind its way out of the kernel. As thread A runs the return path of ipc_kobject_server it decrements the send right count for the user client port.

9. Because the send right count now hits 0, a "no more senders" notification is generated for that port, which calls your ::clientDied method to be called. It's called by thread A.

10. Your ::clientDied method shuts down the user client, which in turn destroys the IOMemoryDescriptor. This unwires the VM map entry, which unblocks thread B.

11. After executing your ::clientDied method thread A eventually leaves the kernel and runs the AST which results in the thread being cleaned up (thread_terminate_self).

12. Thread B is now scheduled and returns from vm_map_remove. It continues to execute task_terminate_internal, which eventually returns and thread B leaves the kernel.
13. Thread B now runs its AST and cleans itself up (thread_terminate_self).
As you can see, making thread A interruptible is the key point in resolving the deadlock.

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: waiting queue
From: Jim Magee <email@hidden>


References:  
  >Re: waiting queue (From: Michael Smith <email@hidden>)




Prev by Date:
Re: Ports lost after reboot..

Next by Date:
Re: Ports lost after reboot..

Previous by thread:
Re: waiting queue

Next by thread:
Re: waiting queue

Index(es):

Date
Thread