Re: ptrace / task_for_pid / pid_suspend
Re: ptrace / task_for_pid / pid_suspend
- Subject: Re: ptrace / task_for_pid / pid_suspend
- From: Antoine Missout <email@hidden>
- Date: Thu, 8 Jul 2010 08:25:58 -0400
Hi Terry,
Thanks for the information.
We will requests APIs for what we need, but since we know there's no chance they'll be implemented by the end of August, we did not rely on that route. What we're doing is indeed currently unsupported. But it has been stable, we didn't have to change a thing from 10.5.8 to 10.6 to 10.6.4. More on that in a follow-up email off-list (I'll explain our design and what we need).
As for DirectoryService, it's not technically a kernel deadlock, but as I said, you'll be hitting the power button quite soon... There is no issue with it afaik, I just took that as an example.
For the deadlock avoidance: timeouts will do the trick as the kernel does with DS, but hitting timeouts makes for a sluggish system, so avoiding timeouts in addition (using early detection) is even better.
- Antoine
On 2010-07-08, at 5:15 AM, Terry Lambert wrote:
> I really did not like the implications of what you are doing with regard to requiring lock-stepping a kernel version. It implies interposition of non-KPI kernel functions for which KPI was no provided, with a reason. If you need KPI for something, please file a bug report requesting it (with an explanation of your problem space), and if the request is reasonable, people will try to accommodate the request, if possible. With the expression of the problem space, however, you are more likely to get an explanation of how you should actually be trying to accomplish the goal that led you to request the KPI.
>
>
> On Jul 1, 2010, at 12:43 PM, Antoine Missout wrote:
>> A few notes about suspending processes. These two are (potential) deadlocks:
>>
>> - suspend dynamic_pager
>> -- call malloc
>> -- do stuff
>> -- free
>> - resume dynamic_pager
>> ->The malloc call could stall indefinitely if the system needs to page
>
> Investigate Djikstra's "Banker's algorithm": <http://en.wikipedia.org/wiki/Banker's_algorithm>
>
> This is how most Apple code avoids deadlocks from call-outs to other code: be ready with the answer for any possible question you might be asked.
>
>
>> - suspend DirectoryServices
>> -- call getgroups or getpwent
>> - resume DirectoryServices
>> ->The call will stall forever as it is serviced by DirectoryServices (and the machine will become unusable pretty quickly as DirectoryServices is needed by almost anything).
>
> These calls are library functions which are handled by a user space IPC, and do not involve a kernel deadlock.
>
> It's true that the kernel also calls out to DirectoryServices for group membership questions in support of users in more than 16 groups. If DirectoryServices does not respond to an individual request within 30 seconds, then the kernel will revoke it's authority, fail the outstanding unanswered requests, and not forward additional requests to DS. If DS requests additional work from the kernel after that, it receives an EPERM from the system call, indicating that it has to reregister as the external group membership resolver.
>
> If there is actually an issue in the user space IPC, then you need to file a bug report, but this is not a kernel deadlock.
>
>
>> Now it turns out Instruments does something like this:
>> - get the task_t for our daemon through task_for_pid
>> - suspend our daemon
>> -- (inspect it's memory ?)
>> -- call stat on the daemon executable path
>> - resume our daemon
>> -> this deadlocks the machine since we police file accesses through our daemon
>
> You are pretty clearly doing what I feared, above. If you are interposing via a mechanism other than kauth, your model is not supported. If so, request additional kauth KPI to support your model via supported interfaces (see above on "reasonableness").
>
> The kauth interface in general is what is intended for this, and if you are doing the gating via kauth, you MUST be prepared to answer the question immediately, without additional system calls. If you require communication with other components, your other components MUST follow the same rule. If you require communication with system components, you MUST exempt them from your interposition, and ensure they follow the Banker's algorithm philosophy themselves (or don't use those services).
>
> If an AntiVirus vendor, for example, asks me how they should handle virus pattern files, I tell them to map them into the daemon process' address space so that access to the answer is via the paging path, and make their daemon immune to interposition via the paging path. This lets their patterns avoid being in wired memory, lets them have a user space component, and guarantees that they can give an answer to any kernel request. In addition, I tell them to follow the DirectoryServices model by having some way of timing out the authority of the daemon if the daemon becomes unresponsive.
>
>
>> *** Actually, if you suspend a process you don't know, you basically cannot call any system call or mach rpc as you do not know if they rely on the process you just suspended. ***
>
> Correct. This is an (effectively) insolvable problem with the design of existing APIs (which can not be changed due to standards constraints or hardware limitations), and additionaly have the inability to establish a chain of trust for a request proxied through or from untrusted code in any case.
>
> When you start using general purpose frameworks, or third party libraries rom the daemon, frankly, all bets are off. There's no way to find a Hamiltonian cycle in the "Z made this request, but A is responsible for triggering this request because A messaged B which communicated with C via shared memory which used a socket to talk to C which ... Z", and therefore exempt the request on the basis of its origin. There are too many IPC mechanisms with unattributed sources to proxy identity through all of them, and you can not trust the code path to not alter the message payload in flight.
>
>
>> Right now I just disabled any daemon communication while it is suspended (and log an error), but what I'm thinking of doing is to whitecard any pid that suspends our daemon so that they can hopefully resume it without getting blocked. This will stalls other processes that needs our daemon until the suspender is done with our daemon.
>>
>> Does that makes sense ?
>
> Generally, no (see above). Use a deadlock avoidance strategy rather than deadlock detection and (attempted, and therefore failure-prone) mitigation strategy. You will be much happier.
>
> -- Terry
>
>>
>> - Antoine
>>
>> On 2010-06-30, at 8:05 PM, Antoine Missout wrote:
>>
>>> Hi Mike,
>>>
>>> They are stalls unless, as you say, what is required to restart them is being held up by what they serve.
>>>
>>> If you suspend taskgated, it becomes impossible to resume it as pid_resume will try to consult taskgated which is suspended. Deadlock... You can try it. So if any process suspend taskgated by mistake or for any reason, it will lead to big problems: I'm guessing something similar is at play here.
>>>
>>> The careful consideration of the upstream dependencies is the heart of the issue. And the reason why I'm trying to get more info on what Instruments/CHUD needs, as this is the only thing I could find at the moment that leads to a problem with our setup. I do not know why yet, but simply denying the task_for_pid fixes it for now. Of course I'll look deeper to find the root cause.
>>>
>>> We can simply trace everything until we find what is going on, but I just thought it would be faster to simply ask.
>>>
>>> In any case, we have several different ways to avoid the problem, one of which already in place, but I don't think it is the most elegant yet.
>>>
>>> Regards,
>>> Antoine
>>>
>>>
>>>
>>>
>>> On 2010-06-30, at 7:41 PM, Michael Smith wrote:
>>>
>>>> Antoine,
>>>>
>>>> Neither of these are inversions, as the userland components don't rely on the services they vend. There are several other similar service tasks in the system as well, all of which are maintained by Apple with careful consideration to their upstream dependencies.
>>>>
>>>> In addition, neither of your strawmen are deadlocks; both are simply stalls. The dynamic_pager example is just physical page starvation, and the taskgated example is just a stall waiting for service.
>>>>
>>>> If your architecture causes a deadlock, you need to fix it. If you're simply stalling service until the daemon is restarted, and what you're serving isn't required in order for the daemon to restart, then I don't see your problem.
>>>>
>>>> = Mike
>>>>
>>>> On Jun 30, 2010, at 2:12 PM, Antoine Missout wrote:
>>>>
>>>>> Mike,
>>>>>
>>>>> There already is many dependency inversions on macos x.
>>>>>
>>>>> I'm sure if you suspend the dynamic_pager process, the machine will eventually deadlock (actually, I tested it).
>>>>>
>>>>> Likewise, task_for_pid consults with taskgated (a user-space process), without any timeout (check osfmk/mach/task_access.defs), so if taskgated gets suspended, that will also block indefinitely.
>>>>>
>>>>> There are many other examples. Grep the xnu source code for KERNEL_USER... most of those calls do not have timeouts and will deadlock if the process servicing those rpc calls are suspended.
>>>>>
>>>>> So I'm guessing Instruments takes care not suspending the dynamic_pager, or taskgated ? Or it doesn't suspend them long enough to make a problem ? (race condition?)
>>>>>
>>>>> Actually, try this (requires 10.6.4):
>>>>> #include <unistd.h>
>>>>> #include <stdio.h>
>>>>> int main(int argc, const char* argv[])
>>>>> {
>>>>> if (argc != 3)
>>>>> {
>>>>> printf("./test 0 pid (for suspend)\n");
>>>>> printf("./test 1 pid (for resume)\n");
>>>>> return 2;
>>>>> }
>>>>> int sr = atoi(argv[1]);
>>>>> int pid = atoi(argv[2]);
>>>>> int r = syscall(430 + sr, pid);
>>>>> if (sr) printf("pid_resume(%i) return %i\n", pid, r);
>>>>> else printf("pid_suspend(%i) return %i\n", pid, r);
>>>>> return 0;
>>>>> }
>>>>>
>>>>> And suspend the dynamic_pager (as root), then start eating memory... The machine *will* deadlock.
>>>>> Likewise for other system critical processes.
>>>>>
>>>>> - Antoine
>>>>>
>>>>>
>>>>>
>>>>> On 2010-06-30, at 1:08 PM, Michael Smith wrote:
>>>>>
>>>>>>
>>>>>> Antoine,
>>>>>>
>>>>>> Your design is bad; you have introduced a dependency inversion and you need to come up with a way for the kernel to make forward progress without your daemon running. Even if it is "running", there is no guarantee that it can run (e.g. blocked on a pagefault) when you expect it to.
>>>>>>
>>>>>> Or at the very least, tell us what your product is so that we can avoid it.
>>>>>>
>>>>>> = Mike
>>>>>>
>>>>>>
>>>>>> On Jun 30, 2010, at 7:59 AM, Antoine Missout wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have a kernel driver + user-space daemon for a new product. Once loaded & running, the daemon becomes crucial to the proper running of the kernel: more exactly, the daemon can be running or not, but must not freeze as the kernel will deadlock while waiting for an answer.
>>>>>>>
>>>>>>> The setup is now very stable, the only issue we've run lately was with Instruments, using Time Profile for all process: this would result in an immediate lockup of the machine.
>>>>>>>
>>>>>>> Reading a bit of documentation about this probing, it seems Instruments suspends all process to take statistics.
>>>>>>>
>>>>>>> Now to avoid this problem, we've taken step to disable ptrace(PT_DENY_ATTACH), task_for_pid for our daemon, and while we're at it, also pid_suspend introduced in 10.6.4.
>>>>>>>
>>>>>>> This now allows for Instruments to Time Profile the whole machine without lockup the machine. We would prefer to take less drastic steps to avoid this deadlock, but without knowing what exactly is Instruments doing using our task port, this is difficult. It still seems to gather stats for the daemon, but the addresses look like garbage.
>>>>>>>
>>>>>>> So the questions are:
>>>>>>>
>>>>>>> - what is Instruments (or DTTSecD ?) doing exactly ?
>>>>>>> - is there a better way to make sure the daemon is never suspended in any way ? (or know it in advance ?)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Antoine
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Do not post admin requests to the list. They will be ignored.
>>>>>>> Darwin-kernel mailing list (email@hidden)
>>>>>>> Help/Unsubscribe/Update your Subscription:
>>>>>>>
>>>>>>> This email sent to email@hidden
>>>>>>
>>>>>> --
>>>>>> Excellence in any department can be attained only by the labor of a lifetime; it is not to be purchased at a lesser price -- Samuel Johnson
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> True terror is to wake up one morning and discover that your high school class is running the country. -- Kurt Vonnegut
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Darwin-kernel mailing list (email@hidden)
>>> Help/Unsubscribe/Update your Subscription:
>>>
>>> This email sent to email@hidden
>>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Darwin-kernel mailing list (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>>
>> This email sent to email@hidden
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden