RE: dup2() Problem
RE: dup2() Problem
- Subject: RE: dup2() Problem
- From: Norm Green <email@hidden>
- Date: Mon, 9 Jul 2007 11:29:11 -0700
Title: RE: dup2() Problem
I've found the problem and it is in my code.
During program initialization, we call getrlimit(RLIMIT_NOFILE, &foo) and setrlimit() to increase the number of file descriptors. Turns out that on Darwin, the type rlim_t is always a 64 bit integer and it is set to very big values in the rlimit struct. Our code was assuming that rlim_t is a 32 bit int in 32 bit builds (which is true for most other UNIX flavors). When we assigned the large 64 bit value to a 32 bit int, we ended up with a value of -1 due to truncation. So we end up calling setrlimit(RLIMIT_NOFILE) where we set both limits to -1. This seems to prevent future calls from dup2() from succeeding.
Fixing my code to do the appropriate casts fixes the problem.
Thanks to everyone for your help with this problem.
Norm Green
GemStone Systems Inc.
-----Original Message-----
From: Mark Rhoads [mailto:email@hidden]
Sent: Saturday, July 07, 2007 1:13 AM
To: Norm Green
Cc: email@hidden
Subject: Re: dup2() Problem
Hi Norm,
Wait a second, are we talking about this code snippet:
> int logfd = open(logName, O_WRONLY|O_APPEND|O_CREAT, 0644);
> dup2(logfd, STDOUT_FILENO);
Or are we talking about some shell code passed to system()?
What string is being passed to system()?
Whether this is a fork/open/dup as explicit code or via a call to
system() doesn't matter much. If this is about system(), then I'd even
more strongly suspect that memory corruption in the parent process is
indirectly causing the dup2 to fail (that is, the parent process memory
is corrupted before the 'fork', so the child process has a 'copy' of the
parent process memory, warts and all. That dup2() gives an EBADF
instead of crashing or succeeding is just a side effect of how things
are corrupted (e.g. Justin's "good garbage"). Stop looking at system()
and dup2(). Suspect all code in all threads that lead up to the moment
of the failed system() or dup2() call. Use a memory debugger.
...
So, it looks like you have a good fd to pass to dup2 (0x5), and I know
that 0x1 (stdout) is normally a valid fd to use as the 2nd parameter to
dup2(). Yet dup2 fails with EBADF. This smells like memory corruption
to me.
A small test case works fine.
There's gotta be some side-effect mucking things up for you before the
dup2 call.
Does the program test that the 'fork' actually succeeded?
Does the problem happen at the first run-thru of this bit of code, or
does it work for one or more iterations before dup2 fails?
Does this program end up using fork() or vfork()? In this snippet or
elsewhere?
What is happening between the 'fork' and the dup2()?
Try "xx = dup( logfd )". If it fails, then you've either hit some odd
limit or there are corrupted in-memory structures in the program's
process space (I'm favoring the latter). If it succeeds, well, you'll
at least know that your problem is not an fd limit -- still could be
process memory corruption. The program's memory could have become
corrupted at some 'distance' from where you are actually seeing a
symptom, and even long before the call to 'fork'. That dup2() happens
to fail is probably something of a red herring, though EBADF for no
other good reason can be an indication of corrupted memory.
I'd also try a memory debugger. Probably a good thing to do anyway when
porting (or developing, or testing, ...)
Again, I'm favoring that something in the parent process is corrupting
memory in such a way that most things seem to work, but dup2() just
happens to fail -- It's probably not a problem with system(), dup2() or
their semantics, but more likely that some bit of code has ever so
subtly corrupted its own process memory, perhaps quite distant (in time
and space) from where the system() or dup2() actually occurs/fails.
Best Regards,
--Mark Rhoads
Norm Green wrote:
> Thanks Mark. Yes the code below is short snipet.
>
> I ran the program under ktrace and see that the system() call causes a
> call to dup2(), which is failing with errno EBADF:
>
> 5738 sh CALL open(0x303f50,0x601,0x1b6)
> 5738 sh NAMI "testfile"
> 5738 sh RET open 5
> 5738 sh CALL dup2(0x5,0x1)
> 5738 sh RET dup2 -1 errno 9 Bad file descriptor
> 5738 sh CALL write(0x2,0x1800400,0x28)
> 5738 sh GIO fd 2 wrote 40 bytes
> "/bin/sh: line 1: 1: Bad file descriptor
> "
> 5738 sh RET write 40/0x28
> 5738 sh CALL exit(0x1)
>
> This seems to indicate that all is well with the new file I opened since
> it returned fd 5. So does that mean something's wrong with stdout? I
> have put traces in to check the value of fileno(stdout) and it's always
> 1. Is it worth calling:
>
> fstat(fileno(stdout), &sb)
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden