Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: mach in signal handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mach in signal handler

Subject: Re: mach in signal handler
From: Terry Lambert <email@hidden>
Date: Fri, 2 Feb 2007 00:01:18 -0800

OK, realize that this is an oversimiplification that glosses over some of the finer points....

On Feb 1, 2007, at 10:45 PM, Steve Checkoway wrote:

Terry Lambert wrote:
Signals are implemented as Mach ASTs using the BSD AST, which is basically a bit in the ast flags that gets checked when ast_taken() is called. This is basically called during trap and exception handling. So it's possible that a signal could abort a Mach call, if the call happened on an interruptible thread at the right time. In general, the reason to fear calling things in signal handlers is that there are certain things that, if you call them in a signal handler, there's potentially associated user space state that does not get reset as a result of the signal. The problem isn't that you are making the call, per se, it's that the call may already be in progress and partially complete at the time the signal handler fires and reenters the called code. A good example would be "malloc": it's always wrong to allocate or free memory in a signal handler.
We're very careful not to call malloc or anything that may call malloc (such as printf) in the signal handler. If we don't actually plan to leave the signal handler but instead kill the process, is this still a problem?

It depends on the functions. Obviously, POSIX doesn't specify Mach functions, but it does specify BSD (POSIX) functions; the table at the end of section 2.4.3 here is the list of functions it's safe to call:

<http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04_03: >

I can't give you a list of all the functions that do and don't use stub functions in libsystem, or which do or don't take locks, except to say that the list at the end of that section is the only one that's guaranteed to be safe to call.

For raw Mach functions which are IPC calls into the kernel, there's a stub that was generated by the "mig" utility. The stub is currently guaranteed to not take a lock or hold state across multiple calls as an implementation detail.

The call you want happens to be a raw Mach call.

There's no guarantee that this won't be wrappered in a later release, though (i.e. there's no promise to not change it).

In general, anything that's implemented using statefull user space stubs (or almost completely in user space) is not safe to call from a signal handler because of the accumulated state from a call already in progress. If the code being called is not reentrant, you shouldn't call it from a signal handler (or even from a thread, unless you protect the calls by a mutex). Also in general, mixing BSD and Mach semantics is not a good idea, since BSD is layered on top of Mach.
If I remember my graduate OS class correctly, there's some sort of UNIX server process that sits outside the kernel.

In MacOS X, no. The BSD server is integrated into the same address space as Mach and the IOKit. All three are linked into the same address space, and layered IOKit/BSD/Mach, with IOKit symbols invisible to BSD and Mach, and BSD symbols invisible to Mach.

So MacOS X is not a "UNIX Single Server" style implementation; this avoids the overhead that would normally occur as a result of protection domain crossing in a traditional UNIX-on-Mach implementation.

The reason mixing semantics is not a good thing is that some BSD thing you do is not necessarily going to have the same effect for a Mach system call as it does for a BSD system call.

We also do not document (and are not prepared to tie ourselves down to particular implementation details by documenting) interactions.

For the specific call you are talking about, it looks like it's safe (i.e. the worst case outcome is that you try to call it, it gets interrupted, and you don't get the answer you wanted to get).
Would I not get the answer I wanted in the call to the signal handler, or in a previously executing but interrupted call?

The problem is that the AST is set as a flag on the CPU private data to indicate a callout to occur for that theread at the point of an involuntary or voluntary context switch, at which point it fires.

The easiest way to think of this is that the operation runs up through BSD to the user/kernel boundary, at which point in time any pending signals end up firing through the signal trampoline and executing user space code.

So unlike a traditional UNIX (other than Tru64 UNIX, which is also Mach-based, and does signals a similar way), there's no guarantee of synchronicity.

So, for example, you might have a blocking read, and the thread doesn't become interruptible until after the read completes, so the AST fires, and then you fire the signal handler, but the blocked read doesn't get interrupted before the data is transferred and the read completed.

In a traditional UNIX, the signal handler is a frob off the sleep/ tsleep operation, and so what happens is the sleep function is interrupted by the delivery of the signal while the operation is still pending completion, so the operation doesn't complete.

The net effect is that you may not get instantaneous results, and you may end up snapshotting the state of the process afterr the event which raised the signal. The likelihood of this happening goes way up if your application is multithreads.

So you'll get _an_ answer that's a snapshot of the state at the time the signal handler fired, but whether or not that's what you wanted depends on how timing sensistive you are when trying to pick up the information you are trying to get.

Just as a general rule, signal handlers on UNIX systems should probably only ever be used to set global volatile variables that you then check somewhere else, not in the signal handler, to see if the handler fired.
In general, I agree. What we do in this specific case, is build a crash log including a backtrace (being very careful not to call forbidden functions), fork a new process, send the data to it and have it do things like symbol lookups and throw up a dialog informing the user of the crash, etc.


Then you are probably fine... however...

The absolute best way to do this is to do exactly what CrashReporter, gdb, or Adobe products do, in order to do crash reporting.

What these things do is get a port on the process, and so when the exceptional condition happens, they throw an exception to the Mach port. The user space application that has the port then catched the exception message, and reports the crash (this lets them introspect the address space, and so on, with the task suspended, so instead of getting a snapshot after the fact, they get the state at the time of the exception).

In the case of something like gdb, it then decides whether to forward the exception to the application via the Mach port, or whether to eat the exception itself (you could, for example, use this to implement an external pager for a process, and satisfy the page request yourself by mapping it into the target process from yourself).

If the exception is passed to the process, then the exception handler see it, saves off enough information for a signal, and throws a AST_BSD on the thread. When this runs up to user space, then that AST_BSD tis turned into a sendsig() in bsd/dev/[architecture]/ unix_signal.c, and the trampoline fires in user space.

So if you are going to be using Mach interfaces anyway, you might as well get the real information, rather than a snapshot after the fact. For example, maybe you have a race between two threads in your application, and you end up taking a SIGSEGV every once in a while because there's a race where you need to hold a lock in one thread while mapping a page into the process heap, and then you drop the lock (and acquire it in the other thread) before you access the page, so the access doesn't happen when it's not mapped. You will never catch this situation in a way useful enough to diagnose and fix it by going off a signal handler.

To get there from here, I can only tell you to look at the gdb sources, and see what gdb does for signal management for processes being debugged when it's compiled for Darwin/MacOS X.

Signals are persistent conditions, not events, which means that if something happens that would cause a signal handler to fire multiple times in rapid succession, then you will only see the handler fire exactly once.
Once is all we need. We will never return from the handler. I don't recall if we perform suicide or patricide, but one way or another the process will never return from the handler. If the handler is invoked again, we throw up our hands and just die.

It's safe as you've described it, so long as you're not multithreaded. If you are multithreaded, then see above; another thread reentering and a signal handler rentering unsafe code are pretty much the same thing.

You basically don't know what state your stack is in at the time it fired, and the sigaltstack function doesn't work the standard way in Tiger or earlier.
How does it work in Tiger? I know we use it.


It's not fully POSIX conformant (it wasn't intended to be).

The alternate signal stack is not setable/clearable per thread, it's only per-process, so if you set up a small stack and end up having a number of signals come in because you are calling system calls from your signal handler, and a handler only masks the signal it's currently handling, you could overflow the stack.

The longer the duration of a blocking call, the more likely you are to run into this situation.

Of course, now that you know about it, you could make your alternate signal stack, if you have one, much larger and not have a problem.

If what you are trying to diagnose is a stack overflow, though, and you call into your own code from the handler - well, obviously, you could end up trying to use more of what you already don't have, at which point you'll just crash-crash.

-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: mach in signal handler
From: Steve Checkoway <email@hidden>


References:  
  >mach in signal handler (From: Steve Checkoway <email@hidden>)
  >Re: mach in signal handler (From: Terry Lambert <email@hidden>)
  >Re: mach in signal handler (From: Steve Checkoway <email@hidden>)




Prev by Date:
Re: mach in signal handler

Next by Date:
Re: mach in signal handler

Previous by thread:
Re: mach in signal handler

Next by thread:
Re: mach in signal handler

Index(es):

Date
Thread