Re: Getting last data from child process in Leopard
Re: Getting last data from child process in Leopard
- Subject: Re: Getting last data from child process in Leopard
- From: Ingemar Ragnemalm <email@hidden>
- Date: Wed, 26 Mar 2008 16:36:57 +0100
Jordan K. Hubbard wrote:
On Mar 24, 2008, at 12:44 PM, Ingemar Ragnemalm wrote:
Under Tiger, this works very well. Leopard behaves very differently.
This is because your code is very racy. Signals can interrupt system
calls, including read(2), and what's happening is that your SIGCHLD is
causing the read to be interrupted and miss data. Your signal handler
("deadbeef") also should not be attempting to read from the pty since
there are a lot of operations not considered safe from signal handlers
(your results are guaranteed to be "undefined", if not outright
incorrect).
As also pointed out by Steve Checkoway, read() in a signal handler
should be allowed; "read" is listed as reentrant in Stevens&Rago, page
306, figure 10.4. It should be safe as I understand it (as long as I am
careful with where and when I write). In any case, the SIGCHLD handler
never gets any data at all, so I can skip that read. And if I do, the
problem indeed remains.
One of the things I tried was to use SIGIO and only read in the signal
handler, but since SIGIO doesn't seem to work, that was of no use.
The right thing to do is check for EAGAIN in your read() loop and go
back and retry for the data in that case. Given that you're using
O_NONBLOCK, it's even more important to make sure you handle all the
async event cases.
Here is a diff to your code which will make it work less by luck, as
it did on Tiger, and more by design. I'm sure that this is also just
sample code and not something you're trying to really do in production
since looping on a non-blocking fd waiting for input is very
inefficient. That's what poll(2) is for.
I suppose poll() can save some processing, at least if you are polling a
whole bunch of fd's, but calling poll() in a tight loop should be little
or no better than read().
--- forkpty.c.orig 2008-03-25 15:35:16.000000000 -0700
+++ forkpty.c 2008-03-25 15:34:22.000000000 -0700
@@ -57,6 +57,7 @@
// Wait for result
while (!done)
{
+try_again:
i = 255; // buffer size
i = read(pty, s, i);
if (i>0)
@@ -64,8 +65,11 @@
s[i] = 0;
printf("Read %d: <%s>\n", i, s);
}
- else
+ else {
printf("*");
+ if (errno == EAGAIN)
+ goto try_again;
+ }
sleep(1);
}
If I read you code right, you are polling like mad until you get
something, and only then there is some idle time. And if the other
process dies during that sleep, then we miss data all the same.
Polling like mad will indeed help, as long as there is no sleep(), and
will be disastrous for performance. And for that matter, I have a GUI to
attend to, so in my app, I poll using a timer.
Steve Checkoway wrote:
Rather than spam the list with my version of your code, you can get it
here: <http://pahtak.org/~steve/child.c>. I would have just posted a
diff like Jordan did, but I couldn't handle your inconsistent spacing.
Thanks, but... you do the fflush() trick in the child, which sure helps
if you have control over the child, but that is not the case, so it is
one of the things to avoid.
So:
- You can't play the "read as mad" trick that Jordan K. Hubbard
suggested, because it will kill performance. It will usually help though.
- You can't add neither a fflush() or a sleep() in the child. If the
child is a single printf, it should work. And if the child happens to be
GCC (which it will be), it should work without recompiling GCC with a
ton of fflush'es.
It seems to work correctly for me written that way. The signal handler
in my version does nothing but set a flag and the while loop ends, not
when the handler fires, but when read() returns 0, i.e., end of file.
It seems to me that the signal doesn't do anything meaningful in your code?
I didn't manage to get the signal to interrupt the read(), but that's
probably because of the nonblocking IO. I'm not sure why you think you
need unbuffered IO to avoid deadlocks or to deliver data quickly.
Here is a citation (Stevens&Rago page 680):
"In the coprocess example... we couldn't invoke a coprocess... because
when we talked to the coprocess across a pipe, the standard I/O library
fully buffered the standard input and standard ouitput, leading to a
deadlock. If the coprocess is a compiled program for which we don't have
the source, we can't add fflush()... What we need to do is to place a
pseudo terminal between the two processes..."
So I know pretty well why I "think" that I need this. And in Tiger, this
is 100% true. I was helpless until I found the pty's, and then it
worked, smooth and reliable. Until now.
As you
probably noticed, printf() on the child side was buffering anyway.
With a pty, it shouldn't buffer the pipe to a deadlock. That's the
point. But maybe that is the problem, that pty's in Leopard buffer data
when Tiger does not.
But there is one interesting detail here: Your output is indeed
buffered, so it arrives all at one time in the end (in Tiger), while my
code is delivered unbuffered. That is a pretty significant difference
that I can't explain yet.
Why are you waiting for the child with wait()? When the signal works
(which it does if put after the forkpty()) then there is nothing to wait
for. And when you get the EIO, isn't that a safe sign too?
As a side note, this runs on (at least this particular flavor of) Linux
as well with the small changes (included in the file) to include <pty.h>
instead of <util.h> and to break out of the loop when read() returns -1
and errno is EIO.
It is all built from common Unix calls so it should work. Didn't link in
RHEL though, and that's the only Linux I have easily accessible.
Here is an even shorter and simpler version, no read() in the signal,
same problems:
http://www.ragnemalm.se/stuff/ptytrouble-smaller.c
I still use signals. I can't really see the checks for EAGAIN, EINTR and
EIO making much of a difference, but since our programs do behave
differently, I will pursue that a bit more, and see what termination on
EIO gives me. Anyway, they don't remove the main problem, the lost data.
Thanks for the replies! Still searching though...
/Ingemar
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden