Re: Possible bug with nanosleep()?
Re: Possible bug with nanosleep()?
- Subject: Re: Possible bug with nanosleep()?
- From: Terry Lambert <email@hidden>
- Date: Mon, 1 Mar 2010 21:11:04 -0800
On Feb 28, 2010, at 11:48 AM, Chris Wilson wrote:
Hi all,
I'm developing open source software that runs on MacOS among other
platforms. I recently discovered a problem where code that works
fine on other platforms is hanging indefinitely on OSX 10.6.2. I'm
sure it worked in 10.3 on PPC hardware.
The code in question does this:
void safe_sleep(int seconds)
{
struct timespec ts, tr;
memset(&ts, 0, sizeof(ts));
ts.tv_sec = seconds;
ts.tv_nsec = 0;
while (nanosleep(&ts, &tr) == -1 && errno == EINTR)
{
BOX_TRACE("nanosleep interrupted with " <<
ts.tv_sec << "." << ts.tv_nsec <<
" secs remaining, sleeping again");
if (ts.tv_sec >= seconds)
{
BOX_WARNING("nanosleep returned with junk in
" <<
"struct: " << ts.tv_sec << "." <<
ts.tv_nsec);
return;
}
ts = tr;
/* sleep again */
}
}
You need to be looking at tr, not ts, or you need to do the structure
assign immediately after the EINTR, before your trace statements. The
tr contains the remainder time, the ts structure contents are
irrelevant after the nanosleep() call. You're BOX_ macros appear to
indicate you wanted the remainder time.
This looks suspiciously like code one would use to implement a polling
loop. That's generally a mistake. I think if you are getting signals
often enough for this to be an issue, you'd be better off passing the
same address structure in for both parameters and specifying
SA_RESTART in the flags field. Even then, this is probably a bad use
of signals, since multiple signals being sent won't necessarily result
in multiple notifications. Signals are defined as persistent
conditions, not events, which means, for example, if you had multiple
child processes die in a narrow time window, the last one to happen
will be overwriting the siginfo information (assuming you set the
SA_SIGINFO flag in the sa_flags field, and used sigaction() rather
than signal() so that you are getting siginfo in the first place).
This is typically why things that intend to reap child process exit
status loop calling waitpid(-1, &statusvar, WNOHANG) until it returns
-1 with an errno of ECHILD. If this isn't the specific case, then
you'd be a lot better off avoiding signals, and using a reliable IPC
mechanism instead.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden