Re: ptrace / task_for_pid / pid_suspend
Re: ptrace / task_for_pid / pid_suspend
- Subject: Re: ptrace / task_for_pid / pid_suspend
- From: Terry Lambert <email@hidden>
- Date: Thu, 8 Jul 2010 02:15:54 -0700
I really did not like the implications of what you are doing with
regard to requiring lock-stepping a kernel version. It implies
interposition of non-KPI kernel functions for which KPI was no
provided, with a reason. If you need KPI for something, please file a
bug report requesting it (with an explanation of your problem space),
and if the request is reasonable, people will try to accommodate the
request, if possible. With the expression of the problem space,
however, you are more likely to get an explanation of how you should
actually be trying to accomplish the goal that led you to request the
KPI.
On Jul 1, 2010, at 12:43 PM, Antoine Missout wrote:
A few notes about suspending processes. These two are (potential)
deadlocks:
- suspend dynamic_pager
-- call malloc
-- do stuff
-- free
- resume dynamic_pager
->The malloc call could stall indefinitely if the system needs to page
Investigate Djikstra's "Banker's algorithm": <http://en.wikipedia.org/wiki/Banker's_algorithm
>
This is how most Apple code avoids deadlocks from call-outs to other
code: be ready with the answer for any possible question you might be
asked.
- suspend DirectoryServices
-- call getgroups or getpwent
- resume DirectoryServices
->The call will stall forever as it is serviced by DirectoryServices
(and the machine will become unusable pretty quickly as
DirectoryServices is needed by almost anything).
These calls are library functions which are handled by a user space
IPC, and do not involve a kernel deadlock.
It's true that the kernel also calls out to DirectoryServices for
group membership questions in support of users in more than 16
groups. If DirectoryServices does not respond to an individual
request within 30 seconds, then the kernel will revoke it's authority,
fail the outstanding unanswered requests, and not forward additional
requests to DS. If DS requests additional work from the kernel after
that, it receives an EPERM from the system call, indicating that it
has to reregister as the external group membership resolver.
If there is actually an issue in the user space IPC, then you need to
file a bug report, but this is not a kernel deadlock.
Now it turns out Instruments does something like this:
- get the task_t for our daemon through task_for_pid
- suspend our daemon
-- (inspect it's memory ?)
-- call stat on the daemon executable path
- resume our daemon
-> this deadlocks the machine since we police file accesses through
our daemon
You are pretty clearly doing what I feared, above. If you are
interposing via a mechanism other than kauth, your model is not
supported. If so, request additional kauth KPI to support your model
via supported interfaces (see above on "reasonableness").
The kauth interface in general is what is intended for this, and if
you are doing the gating via kauth, you MUST be prepared to answer the
question immediately, without additional system calls. If you require
communication with other components, your other components MUST follow
the same rule. If you require communication with system components,
you MUST exempt them from your interposition, and ensure they follow
the Banker's algorithm philosophy themselves (or don't use those
services).
If an AntiVirus vendor, for example, asks me how they should handle
virus pattern files, I tell them to map them into the daemon process'
address space so that access to the answer is via the paging path, and
make their daemon immune to interposition via the paging path. This
lets their patterns avoid being in wired memory, lets them have a user
space component, and guarantees that they can give an answer to any
kernel request. In addition, I tell them to follow the
DirectoryServices model by having some way of timing out the authority
of the daemon if the daemon becomes unresponsive.
*** Actually, if you suspend a process you don't know, you basically
cannot call any system call or mach rpc as you do not know if they
rely on the process you just suspended. ***
Correct. This is an (effectively) insolvable problem with the design
of existing APIs (which can not be changed due to standards
constraints or hardware limitations), and additionaly have the
inability to establish a chain of trust for a request proxied through
or from untrusted code in any case.
When you start using general purpose frameworks, or third party
libraries rom the daemon, frankly, all bets are off. There's no way
to find a Hamiltonian cycle in the "Z made this request, but A is
responsible for triggering this request because A messaged B which
communicated with C via shared memory which used a socket to talk to C
which ... Z", and therefore exempt the request on the basis of its
origin. There are too many IPC mechanisms with unattributed sources
to proxy identity through all of them, and you can not trust the code
path to not alter the message payload in flight.
Right now I just disabled any daemon communication while it is
suspended (and log an error), but what I'm thinking of doing is to
whitecard any pid that suspends our daemon so that they can
hopefully resume it without getting blocked. This will stalls other
processes that needs our daemon until the suspender is done with our
daemon.
Does that makes sense ?
Generally, no (see above). Use a deadlock avoidance strategy rather
than deadlock detection and (attempted, and therefore failure-prone)
mitigation strategy. You will be much happier.
-- Terry
- Antoine
On 2010-06-30, at 8:05 PM, Antoine Missout wrote:
Hi Mike,
They are stalls unless, as you say, what is required to restart
them is being held up by what they serve.
If you suspend taskgated, it becomes impossible to resume it as
pid_resume will try to consult taskgated which is suspended.
Deadlock... You can try it. So if any process suspend taskgated by
mistake or for any reason, it will lead to big problems: I'm
guessing something similar is at play here.
The careful consideration of the upstream dependencies is the heart
of the issue. And the reason why I'm trying to get more info on
what Instruments/CHUD needs, as this is the only thing I could find
at the moment that leads to a problem with our setup. I do not know
why yet, but simply denying the task_for_pid fixes it for now. Of
course I'll look deeper to find the root cause.
We can simply trace everything until we find what is going on, but
I just thought it would be faster to simply ask.
In any case, we have several different ways to avoid the problem,
one of which already in place, but I don't think it is the most
elegant yet.
Regards,
Antoine
On 2010-06-30, at 7:41 PM, Michael Smith wrote:
Antoine,
Neither of these are inversions, as the userland components don't
rely on the services they vend. There are several other similar
service tasks in the system as well, all of which are maintained
by Apple with careful consideration to their upstream dependencies.
In addition, neither of your strawmen are deadlocks; both are
simply stalls. The dynamic_pager example is just physical page
starvation, and the taskgated example is just a stall waiting for
service.
If your architecture causes a deadlock, you need to fix it. If
you're simply stalling service until the daemon is restarted, and
what you're serving isn't required in order for the daemon to
restart, then I don't see your problem.
= Mike
On Jun 30, 2010, at 2:12 PM, Antoine Missout wrote:
Mike,
There already is many dependency inversions on macos x.
I'm sure if you suspend the dynamic_pager process, the machine
will eventually deadlock (actually, I tested it).
Likewise, task_for_pid consults with taskgated (a user-space
process), without any timeout (check osfmk/mach/
task_access.defs), so if taskgated gets suspended, that will also
block indefinitely.
There are many other examples. Grep the xnu source code for
KERNEL_USER... most of those calls do not have timeouts and will
deadlock if the process servicing those rpc calls are suspended.
So I'm guessing Instruments takes care not suspending the
dynamic_pager, or taskgated ? Or it doesn't suspend them long
enough to make a problem ? (race condition?)
Actually, try this (requires 10.6.4):
#include <unistd.h>
#include <stdio.h>
int main(int argc, const char* argv[])
{
if (argc != 3)
{
printf("./test 0 pid (for suspend)\n");
printf("./test 1 pid (for resume)\n");
return 2;
}
int sr = atoi(argv[1]);
int pid = atoi(argv[2]);
int r = syscall(430 + sr, pid);
if (sr) printf("pid_resume(%i) return %i\n", pid, r);
else printf("pid_suspend(%i) return %i\n", pid, r);
return 0;
}
And suspend the dynamic_pager (as root), then start eating
memory... The machine *will* deadlock.
Likewise for other system critical processes.
- Antoine
On 2010-06-30, at 1:08 PM, Michael Smith wrote:
Antoine,
Your design is bad; you have introduced a dependency inversion
and you need to come up with a way for the kernel to make
forward progress without your daemon running. Even if it is
"running", there is no guarantee that it can run (e.g. blocked
on a pagefault) when you expect it to.
Or at the very least, tell us what your product is so that we
can avoid it.
= Mike
On Jun 30, 2010, at 7:59 AM, Antoine Missout wrote:
Hi,
We have a kernel driver + user-space daemon for a new product.
Once loaded & running, the daemon becomes crucial to the proper
running of the kernel: more exactly, the daemon can be running
or not, but must not freeze as the kernel will deadlock while
waiting for an answer.
The setup is now very stable, the only issue we've run lately
was with Instruments, using Time Profile for all process: this
would result in an immediate lockup of the machine.
Reading a bit of documentation about this probing, it seems
Instruments suspends all process to take statistics.
Now to avoid this problem, we've taken step to disable
ptrace(PT_DENY_ATTACH), task_for_pid for our daemon, and while
we're at it, also pid_suspend introduced in 10.6.4.
This now allows for Instruments to Time Profile the whole
machine without lockup the machine. We would prefer to take
less drastic steps to avoid this deadlock, but without knowing
what exactly is Instruments doing using our task port, this is
difficult. It still seems to gather stats for the daemon, but
the addresses look like garbage.
So the questions are:
- what is Instruments (or DTTSecD ?) doing exactly ?
- is there a better way to make sure the daemon is never
suspended in any way ? (or know it in advance ?)
Thanks,
Antoine
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
--
Excellence in any department can be attained only by the labor
of a lifetime; it is not to be purchased at a lesser price --
Samuel Johnson
--
True terror is to wake up one morning and discover that your high
school class is running the country. -- Kurt Vonnegut
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden