RE: perl 5.8.8, backtick execution and leopard
RE: perl 5.8.8, backtick execution and leopard
- Subject: RE: perl 5.8.8, backtick execution and leopard
- From: Nathan Herring <email@hidden>
- Date: Thu, 19 Jun 2008 15:43:13 -0700
- Acceptlanguage: en-US
- Thread-topic: perl 5.8.8, backtick execution and leopard
This turned out to probably be user error (or at least bad design).
There was code that set SIGCHLD’s handler to ‘IGNORE’, and perl would
gracefully warn us that it couldn’t do that, and changed the behavior to “default”,
and then we’d get into this situation. We ultimately swapped in a SIGCHLD that
would reap the zombie pid and this problem went away.
So, I’m not sure whether or not it’d have been correct behavior
for the kernel call to behave in this fashion if an app wasn’t reaping dead
child pids; perhaps someone else can comment on that. Fortunately, it’s a moot
question for us.
From: Dave Zarzycki
[mailto:email@hidden]
Sent: Thursday, January 24, 2008 9:03 AM
To: Nathan Herring
Cc: email@hidden
Subject: Re: perl 5.8.8, backtick execution and leopard
This sounds like a kernel bug. We'd need to attach a
kernel debugger to investigate further. Also, what is the third argument
to wait4()? If the WNOHANG flag is being passed, then this
is definitely a kernel bug. One more thing, if you're running
dtrace, then please probe the following kernel APIs: proc_reparent() and
ptrace() and let us know if that call is ever being made on your box (you'll
need to avoid using gdb during this test).
On Jan 23, 2008, at 3:42 PM, Nathan Herring wrote:
We
have a lightweight perl-based HTTP server running. The main loop looks like
accept() (with a 10s timeout), and if it timed out, we run an “OnTick” periodic
task. After upgrading from 10.4.10 to 10.5.1, the server will eventually stop
responding. We’ve tracked it down to perl hanging in __wait4() underneath a
backtick execution (calling df to determine whether we need to do disk space
cleanup commands). wait4()’s argument is a pid that isn’t in the process list,
but nonetheless, the function never returns. This doesn’t seem to happen until
such time as the HTTP server kicks off a local process that uses the machine
heavily (including making major edits in the directory on which df is called).
Because the pid isn’t around, I don’t think df is the culprit, but I cannot
fathom why wait4 would get stuck. (From gdb’s perspective, there’s only one
thread, so...)
Is this a known issue and/or is there something I can do to track it down
further or work around it? (I was using Instruments with tracing the user
function Perl_my_popen with argument1 (the cmd), and the syscall to wait4 for
both entry and exit to see that exit isn’t happening.)
-nh
|
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden