Re: Possible bug with nanosleep()?
Re: Possible bug with nanosleep()?
- Subject: Re: Possible bug with nanosleep()?
- From: Chris Wilson <email@hidden>
- Date: Sun, 28 Feb 2010 20:48:05 +0100 (CET)
Hi all,
I'm developing open source software that runs on MacOS among other
platforms. I recently discovered a problem where code that works fine on
other platforms is hanging indefinitely on OSX 10.6.2. I'm sure it worked
in 10.3 on PPC hardware.
The code in question does this:
void safe_sleep(int seconds)
{
struct timespec ts, tr;
memset(&ts, 0, sizeof(ts));
ts.tv_sec = seconds;
ts.tv_nsec = 0;
while (nanosleep(&ts, &tr) == -1 && errno == EINTR)
{
BOX_TRACE("nanosleep interrupted with " <<
ts.tv_sec << "." << ts.tv_nsec <<
" secs remaining, sleeping again");
if (ts.tv_sec >= seconds)
{
BOX_WARNING("nanosleep returned with junk in " <<
"struct: " << ts.tv_sec << "." <<
ts.tv_nsec);
return;
}
ts = tr;
/* sleep again */
}
}
On OSX 10.6.2, I get the warning:
nanosleep returned with junk in struct: 4294967295.999958431
And before I added a test for this condition, it would loop and sleep
again for 4294967295 seconds, i.e. hang forever.
I noticed that 4294967295 looks suspiciously like the int32_t
representation of -1 treated as unsigned (cast to uint32_t). When I added
the following code to check for this:
int secs = (int32_t) tr.tv_sec;
if (secs < 0)
{
BOX_WARNING("nanosleep interrupted late, " <<
secs << "." << tr.tv_nsec <<
" secs remaining");
return;
}
I see instead:
nanosleep interrupted late, -1.999971228 secs remaining
and the loop does exit. Looking at the code for nanosleep in libc, which
you can find here:
http://opensource.apple.com/source/Libc/Libc-583/gen/nanosleep.c
and noting that the first version, for__DARWIN_UNIX03, appears to be used
in this case, as gdb interrupts the hung process in __semwait_signal(), I
can see that it does integer arithmetic on a mach_timespec_t structure,
whose values are then assigned to the returned structure:
/* This depends on the layout of a mach_timespec_t and
timespec_t being equivalent */
ADD_MACH_TIMESPEC(¤t, requested_time);
SUB_MACH_TIMESPEC(¤t, &remain);
remaining_time->tv_sec = current.tv_sec;
remaining_time->tv_nsec = current.tv_nsec;
First of all, the comment raised my suspicions, although the macros access
the members directly so the actual memory layouts do not need to be
identical. mach_timespec_t appears to be defined thus:
struct mach_timespec {
unsigned int tv_sec; /* seconds */
clock_res_t tv_nsec; /* nanoseconds */
};
typedef struct mach_timespec mach_timespec_t;
(I cannot find this on the web, I copied it from
/usr/include/mach/clock_types.h on this OSX 10.6.2 box).
The SUB_MACH_TIMESPEC macro does:
/* t1 -= t2 */
#define SUB_MACH_TIMESPEC(t1, t2) \
do { \
if (((t1)->tv_nsec -= (t2)->tv_nsec) < 0) { \
(t1)->tv_nsec += NSEC_PER_SEC; \
(t1)->tv_sec -= 1; \
} \
(t1)->tv_sec -= (t2)->tv_sec; \
} while (0)
struct timespec is defined thus in sys/structs.h:
#define _STRUCT_TIMESPEC struct timespec
_STRUCT_TIMESPEC
{
__darwin_time_t tv_sec;
long tv_nsec;
};
and __darwin_time_t is defined as long in /usr/include/i386/_types.h:
typedef long __darwin_time_t; /* time() */
So I wrote this test program:
#include <mach/clock_types.h>
#include <time.h>
#include <stdio.h>
int main(int argc, char** argv)
{
mach_timespec_t current, remain;
current.tv_sec = 1267383159;
current.tv_nsec = 300;
remain.tv_sec = 1267383160; // 1.0000012 seconds later
remain.tv_nsec = 1500;
struct timespec requested_time, remaining_time;
requested_time.tv_sec = 1;
requested_time.tv_nsec = 0;
ADD_MACH_TIMESPEC(¤t, &requested_time);
SUB_MACH_TIMESPEC(¤t, &remain);
remaining_time.tv_sec = current.tv_sec;
remaining_time.tv_nsec = current.tv_nsec;
int64_t remain_ns = (remaining_time.tv_sec * 1000000000) +
remaining_time.tv_nsec;
printf("remaining time (native tv_sec): %lld\n", remain_ns);
remain_ns = ((long) remaining_time.tv_sec) * 1000000000;
remain_ns += remaining_time.tv_nsec;
printf("remaining time (long tv_sec): %lld\n", remain_ns);
remain_ns = ((int32_t) remaining_time.tv_sec) * 1000000000;
remain_ns += remaining_time.tv_nsec;
printf("remaining time (int32_t tv_sec): %lld\n", remain_ns);
return 0;
}
And it outputs:
chris@jmac(~)$ ./test
remaining time (native tv_sec): 4294967295999998800
remaining time (long tv_sec): 4294967295999998800
remaining time (int32_t tv_sec): -1200
Since long is a 64-bit type on this platform, the value it contains is not
appropriate. Only treating it as a 32-bit signed type gives the correct
value.
Is this a bug, or did I misunderstand something? Please let me know.
Cheers, Chris.
--
_ ___ __ _
/ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden