Re: Getting last data from child process in Leopard

26 Mar 2008

      site_archiver@lists.apple.com
Delivered-To: darwin-dev@lists.apple.com

On Mar 26, 2008, at 8:36 AM, Ingemar Ragnemalm wrote:

That is what select(2) is for.
Steve Checkoway wrote:

Here is a citation (Stevens&Rago page 680):
--
Steve Checkoway
    "Anyone who says that the solution is to educate the users
    hasn't ever met an actual user." -- Bruce Schneier

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwin-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl...

This email sent to site_archiver@lists.apple.com
I suppose poll() can save some processing, at least if you are
polling a whole bunch of fd's, but calling poll() in a tight loop
should be little or no better than read().

Rather than spam the list with my version of your code, you can get
it here: <http://pahtak.org/~steve/child.c>. I would have just
posted a diff like Jordan did, but I couldn't handle your
inconsistent spacing.

Thanks, but... you do the fflush() trick in the child, which sure
helps if you have control over the child, but that is not the case,
so it is one of the things to avoid.

I did that because stdio functions like fwrite() and printf() do not
flush. Actually, printf() flushes when it encounters \n as it turns
out, but I don't know if that is behavior you want to depend on. I
could have used write() instead which wouldn't have been buffered.
So:

- You can't play the "read as mad" trick that Jordan K. Hubbard
suggested, because it will kill performance. It will usually help
though.

- You can't add neither a fflush() or a sleep() in the child. If the
child is a single printf, it should work. And if the child happens
to be GCC (which it will be), it should work without recompiling GCC
with a ton of fflush'es.

If if the child is using buffered IO, it's going to be buffered no
matter what you do.
It seems to work correctly for me written that way. The signal
handler in my version does nothing but set a flag and the while
loop ends, not when the handler fires, but when read() returns 0,
i.e., end of file.

It seems to me that the signal doesn't do anything meaningful in
your code?

That's right, it merely notes when the child dies and prints out a
notice to that effect.
I didn't manage to get the signal to interrupt the read(), but
that's probably because of the nonblocking IO. I'm not sure why you
think you need unbuffered IO to avoid deadlocks or to deliver data
quickly.

"In the coprocess example... we couldn't invoke a coprocess...
because when we talked to the coprocess across a pipe, the standard
I/O library fully buffered the standard input and standard ouitput,
leading to a deadlock. If the coprocess is a compiled program for
which we don't have the source, we can't add fflush()... What we
need to do is to place a pseudo terminal between the two processes..."

In my testing (go ahead, remove your \n in the printf statements), it
was being flushed regardless. I actually removed the \n first because
it was screwing up your nicely planned <%s> output format. Then I was
a bit surprised at first to notice that I didn't get any data at all
until it slept.
So I know pretty well why I "think" that I need this. And in Tiger,
this is 100% true. I was helpless until I found the pty's, and then
it worked, smooth and reliable. Until now.
As you probably noticed, printf() on the child side was buffering
anyway.

With a pty, it shouldn't buffer the pipe to a deadlock. That's the
point. But maybe that is the problem, that pty's in Leopard buffer
data when Tiger does not.

Could be, I don't still have Tiger around to try. Then again, both
Leopard and the version of Linux I was using before both buffer the
printf() if I remove the fflush(). The difference is that Linux
reports child quit, then it reads 16 bytes "hello!there!BYE!" whereas
Leopard loses the data. This must be the loss you were describing.
Maybe that is a bug, I'm not in a position to say. Jordan or Terry
know far more than I.
But there is one interesting detail here: Your output is indeed
buffered, so it arrives all at one time in the end (in Tiger), while
my code is delivered unbuffered. That is a pretty significant
difference that I can't explain yet.
Why are you waiting for the child with wait()? When the signal works
(which it does if put after the forkpty()) then there is nothing to
wait for. And when you get the EIO, isn't that a safe sign too?

I'm waiting because my child terminated and I wanted to clean it up
rather than leave the process hanging around until the parent process
terminates at which time the child gets cleaned up by something else.
The signal does work for me. I put the signal first because otherwise
there is a race condition. What happens if the child terminates before
you've set up the handler? You'd get no signal.
As a side note, this runs on (at least this particular flavor of)
Linux as well with the small changes (included in the file) to
include <pty.h> instead of <util.h> and to break out of the loop
when read() returns -1 and errno is EIO.

It is all built from common Unix calls so it should work. Didn't
link in RHEL though, and that's the only Linux I have easily
accessible.

I don't have red hat, but it worked for me on both ubuntu (ppc) and
fedora (x86). Did you remember to pass -lutil to gcc?
Here is an even shorter and simpler version, no read() in the
signal, same problems:
http://www.ragnemalm.se/stuff/ptytrouble-smaller.c

You're still using buffered IO in the signal handler. You simply
cannot do that. Read the sigaction(2) man page.
I still use signals. I can't really see the checks for EAGAIN, EINTR
and EIO making much of a difference, but since our programs do
behave differently, I will pursue that a bit more, and see what
termination on EIO gives me. Anyway, they don't remove the main
problem, the lost data.

You're treating an error return (-1) with end of file (0) and in both
cases, you do nothing but sleep. You're still using the signal handler
as an indication that there is no more data to read. At least with
Linux, that is simply not true as when I remove the fflush(), I get
SIGCHLD before read() returns the data.
I tried rewriting this to use pipe(2) and select(2). Everything seems
to be working except that dup2(2) and close(2) are returning strange
values on the child side of things. dup2() is failing with errno set
to 0 and yet seeming to succeed. close() is failing with errno set to
22, but the argument looks correct to me. At the very least, dup2()
shouldn't be returning -1 since I'm printing out the error messages to
stdout and the parent is reading the error messages. The code is here <http://pahtak.org/~steve/child2.c
...
. I'm probably just doing something wrong and I'm not noticing it.
smime.p7s

Steve Checkoway

tags

participants (1)