Re: mach in signal handler
Re: mach in signal handler
- Subject: Re: mach in signal handler
- From: Terry Lambert <email@hidden>
- Date: Fri, 2 Feb 2007 00:01:18 -0800
OK, realize that this is an oversimiplification that glosses over some
of the finer points....
On Feb 1, 2007, at 10:45 PM, Steve Checkoway wrote:
Terry Lambert wrote:
Signals are implemented as Mach ASTs using the BSD AST, which is
basically a bit in the ast flags that gets checked when ast_taken()
is called. This is basically called during trap and exception
handling.
So it's possible that a signal could abort a Mach call, if the call
happened on an interruptible thread at the right time.
In general, the reason to fear calling things in signal handlers is
that there are certain things that, if you call them in a signal
handler, there's potentially associated user space state that does
not get reset as a result of the signal.
The problem isn't that you are making the call, per se, it's that
the call may already be in progress and partially complete at the
time the signal handler fires and reenters the called code.
A good example would be "malloc": it's always wrong to allocate or
free memory in a signal handler.
We're very careful not to call malloc or anything that may call
malloc (such as printf) in the signal handler. If we don't actually
plan to leave the signal handler but instead kill the process, is
this still a problem?
It depends on the functions. Obviously, POSIX doesn't specify Mach
functions, but it does specify BSD (POSIX) functions; the table at the
end of section 2.4.3 here is the list of functions it's safe to call:
<http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04_03:
>
I can't give you a list of all the functions that do and don't use
stub functions in libsystem, or which do or don't take locks, except
to say that the list at the end of that section is the only one that's
guaranteed to be safe to call.
For raw Mach functions which are IPC calls into the kernel, there's a
stub that was generated by the "mig" utility. The stub is currently
guaranteed to not take a lock or hold state across multiple calls as
an implementation detail.
The call you want happens to be a raw Mach call.
There's no guarantee that this won't be wrappered in a later release,
though (i.e. there's no promise to not change it).
In general, anything that's implemented using statefull user space
stubs (or almost completely in user space) is not safe to call from
a signal handler because of the accumulated state from a call
already in progress. If the code being called is not reentrant,
you shouldn't call it from a signal handler (or even from a thread,
unless you protect the calls by a mutex).
Also in general, mixing BSD and Mach semantics is not a good idea,
since BSD is layered on top of Mach.
If I remember my graduate OS class correctly, there's some sort of
UNIX server process that sits outside the kernel.
In MacOS X, no. The BSD server is integrated into the same address
space as Mach and the IOKit. All three are linked into the same
address space, and layered IOKit/BSD/Mach, with IOKit symbols
invisible to BSD and Mach, and BSD symbols invisible to Mach.
So MacOS X is not a "UNIX Single Server" style implementation; this
avoids the overhead that would normally occur as a result of
protection domain crossing in a traditional UNIX-on-Mach implementation.
The reason mixing semantics is not a good thing is that some BSD thing
you do is not necessarily going to have the same effect for a Mach
system call as it does for a BSD system call.
We also do not document (and are not prepared to tie ourselves down to
particular implementation details by documenting) interactions.
For the specific call you are talking about, it looks like it's
safe (i.e. the worst case outcome is that you try to call it, it
gets interrupted, and you don't get the answer you wanted to get).
Would I not get the answer I wanted in the call to the signal
handler, or in a previously executing but interrupted call?
The problem is that the AST is set as a flag on the CPU private data
to indicate a callout to occur for that theread at the point of an
involuntary or voluntary context switch, at which point it fires.
The easiest way to think of this is that the operation runs up through
BSD to the user/kernel boundary, at which point in time any pending
signals end up firing through the signal trampoline and executing user
space code.
So unlike a traditional UNIX (other than Tru64 UNIX, which is also
Mach-based, and does signals a similar way), there's no guarantee of
synchronicity.
So, for example, you might have a blocking read, and the thread
doesn't become interruptible until after the read completes, so the
AST fires, and then you fire the signal handler, but the blocked read
doesn't get interrupted before the data is transferred and the read
completed.
In a traditional UNIX, the signal handler is a frob off the sleep/
tsleep operation, and so what happens is the sleep function is
interrupted by the delivery of the signal while the operation is still
pending completion, so the operation doesn't complete.
The net effect is that you may not get instantaneous results, and you
may end up snapshotting the state of the process afterr the event
which raised the signal. The likelihood of this happening goes way up
if your application is multithreads.
So you'll get _an_ answer that's a snapshot of the state at the time
the signal handler fired, but whether or not that's what you wanted
depends on how timing sensistive you are when trying to pick up the
information you are trying to get.
Just as a general rule, signal handlers on UNIX systems should
probably only ever be used to set global volatile variables that
you then check somewhere else, not in the signal handler, to see if
the handler fired.
In general, I agree. What we do in this specific case, is build a
crash log including a backtrace (being very careful not to call
forbidden functions), fork a new process, send the data to it and
have it do things like symbol lookups and throw up a dialog
informing the user of the crash, etc.
Then you are probably fine... however...
The absolute best way to do this is to do exactly what CrashReporter,
gdb, or Adobe products do, in order to do crash reporting.
What these things do is get a port on the process, and so when the
exceptional condition happens, they throw an exception to the Mach
port. The user space application that has the port then catched the
exception message, and reports the crash (this lets them introspect
the address space, and so on, with the task suspended, so instead of
getting a snapshot after the fact, they get the state at the time of
the exception).
In the case of something like gdb, it then decides whether to forward
the exception to the application via the Mach port, or whether to eat
the exception itself (you could, for example, use this to implement an
external pager for a process, and satisfy the page request yourself by
mapping it into the target process from yourself).
If the exception is passed to the process, then the exception handler
see it, saves off enough information for a signal, and throws a
AST_BSD on the thread. When this runs up to user space, then that
AST_BSD tis turned into a sendsig() in bsd/dev/[architecture]/
unix_signal.c, and the trampoline fires in user space.
So if you are going to be using Mach interfaces anyway, you might as
well get the real information, rather than a snapshot after the fact.
For example, maybe you have a race between two threads in your
application, and you end up taking a SIGSEGV every once in a while
because there's a race where you need to hold a lock in one thread
while mapping a page into the process heap, and then you drop the lock
(and acquire it in the other thread) before you access the page, so
the access doesn't happen when it's not mapped. You will never catch
this situation in a way useful enough to diagnose and fix it by going
off a signal handler.
To get there from here, I can only tell you to look at the gdb
sources, and see what gdb does for signal management for processes
being debugged when it's compiled for Darwin/MacOS X.
Signals are persistent conditions, not events, which means that if
something happens that would cause a signal handler to fire
multiple times in rapid succession, then you will only see the
handler fire exactly once.
Once is all we need. We will never return from the handler. I don't
recall if we perform suicide or patricide, but one way or another
the process will never return from the handler. If the handler is
invoked again, we throw up our hands and just die.
It's safe as you've described it, so long as you're not
multithreaded. If you are multithreaded, then see above; another
thread reentering and a signal handler rentering unsafe code are
pretty much the same thing.
You basically don't know what state your stack is in at the time it
fired, and the sigaltstack function doesn't work the standard way
in Tiger or earlier.
How does it work in Tiger? I know we use it.
It's not fully POSIX conformant (it wasn't intended to be).
The alternate signal stack is not setable/clearable per thread, it's
only per-process, so if you set up a small stack and end up having a
number of signals come in because you are calling system calls from
your signal handler, and a handler only masks the signal it's
currently handling, you could overflow the stack.
The longer the duration of a blocking call, the more likely you are to
run into this situation.
Of course, now that you know about it, you could make your alternate
signal stack, if you have one, much larger and not have a problem.
If what you are trying to diagnose is a stack overflow, though, and
you call into your own code from the handler - well, obviously, you
could end up trying to use more of what you already don't have, at
which point you'll just crash-crash.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden