execve randomly returns ENOMEM?
execve randomly returns ENOMEM?
- Subject: execve randomly returns ENOMEM?
- From: Andrew Gallatin <email@hidden>
- Date: Thu, 3 Feb 2005 16:39:37 -0500 (EST)
I'm trying to debug a problem with a perl script which launches many remote
jobs on a cluster of computers. The script essentially forks and
execs a large number of ssh (or rsh) sesssions to launch jobs. This
script works fine on the other UNIX-like OSes (such as linux, solaris,
freebsd, etc) that we support.
However, on MacOSX, occasionally, and with no repeatable pattern, the
some of the execves will fail. This happens even when the maxproc
limit is well above the number of jobs being launched (eg, the exec,
not the fork, is failing).
The the reason for the failure is given as ENOMEM. From a ktrace
snippet:
9885 perl RET fork 9931/0x26cb
9931 perl CALL getpid
9931 perl RET getpid 9931/0x26cb
9931 perl CALL sigaction(0x8,0xbffff0a0,0xbffff160)
9931 perl RET sigaction 0
9931 perl CALL execve(0xbfffec80,0x101b10,0x1008b0)
9931 perl RET execve -1 errno 12 Cannot allocate memory
9931 perl CALL sigaction(0x8,0xbffff100,0)
9931 perl RET sigaction 0
9931 perl CALL write(0x2,0x101310,0x36)
9931 perl GIO fd 2 wrote 54 bytes
9931 perl RET write 54/0x36
9931 perl CALL exit(0xc)
Here is a test script to replicate the problem, which boils down to
this:
% cat ./ssh_forkbomb.pl
#!/usr/bin/perl
my $max_iters = $ARGV[0];
my $cmd = $ARGV[1];
my $i;
my $pid;
printf("running %d iterations of %s\n", $max_iters, $cmd);
for ($i = 0; $ i < $max_iters; $i++) {
$pid = fork;
if ($pid == 0) {
if (!exec($cmd)) {
die "could not exec $cmd";
}
}
}
printf("done\n");
The problem can be more easily seen if you inflate the number of
execve syscalls by using a long, rambling path with the location
of the desired executable near the end.
Here's an example on a dual G5 with 2.5GB of ram running 10.3.5:
% limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 65536 kbytes
coredumpsize unlimited
memoryuse unlimited
descriptors 1024
memorylocked unlimited
maxproc 532
% echo $PATH
/home/gallatin/bin:/home/gallatin/bin/powerpc-darwin:/usr/bin/X11:/usr/X11/bin:/usr/X11R6/bin:/usr/local/X/bin:/usr/openwin/bin:/usr/local/etc:/usr/local/bin:/usr/ucb:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin:/usr/etc:/etc:/usr/local/games:/usr/games:/usr/local/requests/bin:/usr/local/mpich-gm/bin:/opt/mpi-gm/bin:/opt/mpich-mx/bin:/home/gallatin/lanaitools/powerpc_darwin/bin:/opt/gm/bin:/opt/gm/sbin:/opt/ibmcmp/vacpp/6.0/bin
% ./ssh_forkbomb.pl 500 "sleep 10" ; ps ax | grep sleep | grep -v grep | wc -l
running 500 iterations of sleep 10
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
could not exec sleep 10 at ./ssh_forkbomb.pl line 14.
done
489
Things generally work if you use the exact path:
% ./ssh_forkbomb.pl 500 "/bin/sleep 10" ; ps ax | grep sleep | grep -v
grep | wc -l
running 500 iterations of /bin/sleep 10
done
500
Is there any way to get an idea about what allocation in the exec path
failed? I don't even know if its a per-user or per-system limit
that's killing me..
Thanks,
Drew
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden