Re: Getting last data from child process in Leopard
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com User-agent: Thunderbird 2.0.0.12 (Macintosh/20080213) On Mar 24, 2008, at 12:44 PM, Ingemar Ragnemalm wrote: --- forkpty.c.orig 2008-03-25 15:35:16.000000000 -0700 +++ forkpty.c 2008-03-25 15:34:22.000000000 -0700 @@ -57,6 +57,7 @@ // Wait for result while (!done) { +try_again: i = 255; // buffer size i = read(pty, s, i); if (i>0) @@ -64,8 +65,11 @@ s[i] = 0; printf("Read %d: <%s>\n", i, s); } - else + else { printf("*"); + if (errno == EAGAIN) + goto try_again; + } sleep(1); } Steve Checkoway wrote: It seems to me that the signal doesn't do anything meaningful in your code? Here is a citation (Stevens&Rago page 680): http://www.ragnemalm.se/stuff/ptytrouble-smaller.c Thanks for the replies! Still searching though... /Ingemar _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... Jordan K. Hubbard wrote: Under Tiger, this works very well. Leopard behaves very differently. This is because your code is very racy. Signals can interrupt system calls, including read(2), and what's happening is that your SIGCHLD is causing the read to be interrupted and miss data. Your signal handler ("deadbeef") also should not be attempting to read from the pty since there are a lot of operations not considered safe from signal handlers (your results are guaranteed to be "undefined", if not outright incorrect). As also pointed out by Steve Checkoway, read() in a signal handler should be allowed; "read" is listed as reentrant in Stevens&Rago, page 306, figure 10.4. It should be safe as I understand it (as long as I am careful with where and when I write). In any case, the SIGCHLD handler never gets any data at all, so I can skip that read. And if I do, the problem indeed remains. One of the things I tried was to use SIGIO and only read in the signal handler, but since SIGIO doesn't seem to work, that was of no use. The right thing to do is check for EAGAIN in your read() loop and go back and retry for the data in that case. Given that you're using O_NONBLOCK, it's even more important to make sure you handle all the async event cases. Here is a diff to your code which will make it work less by luck, as it did on Tiger, and more by design. I'm sure that this is also just sample code and not something you're trying to really do in production since looping on a non-blocking fd waiting for input is very inefficient. That's what poll(2) is for. I suppose poll() can save some processing, at least if you are polling a whole bunch of fd's, but calling poll() in a tight loop should be little or no better than read(). If I read you code right, you are polling like mad until you get something, and only then there is some idle time. And if the other process dies during that sleep, then we miss data all the same. Polling like mad will indeed help, as long as there is no sleep(), and will be disastrous for performance. And for that matter, I have a GUI to attend to, so in my app, I poll using a timer. Rather than spam the list with my version of your code, you can get it here: <http://pahtak.org/~steve/child.c>. I would have just posted a diff like Jordan did, but I couldn't handle your inconsistent spacing. Thanks, but... you do the fflush() trick in the child, which sure helps if you have control over the child, but that is not the case, so it is one of the things to avoid. So: - You can't play the "read as mad" trick that Jordan K. Hubbard suggested, because it will kill performance. It will usually help though. - You can't add neither a fflush() or a sleep() in the child. If the child is a single printf, it should work. And if the child happens to be GCC (which it will be), it should work without recompiling GCC with a ton of fflush'es. It seems to work correctly for me written that way. The signal handler in my version does nothing but set a flag and the while loop ends, not when the handler fires, but when read() returns 0, i.e., end of file. I didn't manage to get the signal to interrupt the read(), but that's probably because of the nonblocking IO. I'm not sure why you think you need unbuffered IO to avoid deadlocks or to deliver data quickly. "In the coprocess example... we couldn't invoke a coprocess... because when we talked to the coprocess across a pipe, the standard I/O library fully buffered the standard input and standard ouitput, leading to a deadlock. If the coprocess is a compiled program for which we don't have the source, we can't add fflush()... What we need to do is to place a pseudo terminal between the two processes..." So I know pretty well why I "think" that I need this. And in Tiger, this is 100% true. I was helpless until I found the pty's, and then it worked, smooth and reliable. Until now. As you probably noticed, printf() on the child side was buffering anyway. With a pty, it shouldn't buffer the pipe to a deadlock. That's the point. But maybe that is the problem, that pty's in Leopard buffer data when Tiger does not. But there is one interesting detail here: Your output is indeed buffered, so it arrives all at one time in the end (in Tiger), while my code is delivered unbuffered. That is a pretty significant difference that I can't explain yet. Why are you waiting for the child with wait()? When the signal works (which it does if put after the forkpty()) then there is nothing to wait for. And when you get the EIO, isn't that a safe sign too? As a side note, this runs on (at least this particular flavor of) Linux as well with the small changes (included in the file) to include <pty.h> instead of <util.h> and to break out of the loop when read() returns -1 and errno is EIO. It is all built from common Unix calls so it should work. Didn't link in RHEL though, and that's the only Linux I have easily accessible. Here is an even shorter and simpler version, no read() in the signal, same problems: I still use signals. I can't really see the checks for EAGAIN, EINTR and EIO making much of a difference, but since our programs do behave differently, I will pursue that a bit more, and see what termination on EIO gives me. Anyway, they don't remove the main problem, the lost data. This email sent to site_archiver@lists.apple.com
participants (1)
-
Ingemar Ragnemalm