Re: hangs in flockfile() during fread() or fclose()
Re: hangs in flockfile() during fread() or fclose()
- Subject: Re: hangs in flockfile() during fread() or fclose()
- From: Paul Smith <email@hidden>
- Date: Thu, 10 Jul 2014 08:53:40 -0400
On Wed, 2014-07-09 at 21:46 -0700, Greg Parker wrote:
> On Jul 9, 2014, at 9:26 PM, Paul Smith <email@hidden> wrote:
> > On Wed, 2014-07-09 at 19:55 -0500, Stephen J. Butler wrote:
> >> Can you distill this down to self contained test case?
> >
> > I'm wondering if someone has pointers on what we might investigate (and
> > how) when we get a process in this state, that might help us narrow down
> > where to look or what to concentrate on.
>
> Some possibilities include:
> * That thread is deadlocked against itself because it's trying to call
> fread() from a signal handler and the signal handler interrupted
> another flockfile-ing call. What is the rest of that stack trace?
Nope, we don't do signal handlers. Signals are either ignored or cause
the process to crash; we don't register any handlers. The rest of the
stacktrace is unremarkable; a bunch of our internal functions and at the
top (bottom?) of the stack:
#17 0x00007fff88cd4899 in _pthread_body ()
#18 0x00007fff88cd472a in _pthread_start ()
#19 0x00007fff88cd8fc9 in thread_start ()
> * The process is deadlocked because some other thread owns the lock
> and won't let go for some reason. What are the other threads' stack
> traces?
They're all waiting in recv() or sleep(). But, unless I'm badly
misunderstanding something this cannot be true because the FILE* is
opened, used, and closed within this function and the lock is local to
the FILE object and never shared between different FILE objects.
The one thing I was thinking is this: maybe I have an fopen() then an
fclose(), but then something uses the FILE* again after the fclose()
which causes the stdio structure to be corrupted somehow. Then that
FILE* object is used for another fopen() and this causes the problem.
I can't find anyplace that this happens, but I will look harder.
> * The lock is broken because a memory error smashed it. What does the
> memory contents of the lock look like? (I don't know what the
> internals of the current pthread mutex looks like, but the first four
> bytes should be something similar to 'MUTX'.)
OK, memory errors are always a possibility. I'll see if I can dig into
the lock contents.
On Thu, 10 Jul 2014 at 00:04 -0500, Stephen J. Butler wrote:
> Incredible claims -- like stdio being broken -- require incredible
> evidence.
I definitely agree, that's why I didn't say stdio was broken :-) (or at
least, I didn't mean to do so). I'm just reporting what I see and
asking for tips on where to go from here.
> From our perspective, it's much more likely that your code has a
> memory corruption, double free, or stack smashing bug somewhere than
> stdio not working correctly. You might want to try Malloc Debug if you
> haven't already:
>
> https://developer.apple.com/library/mac/documentation/performance/conceptual/managingmemory/articles/MallocDebug.html
> valgrind, although a pain to setup, has also helped me find memory
> related bugs in the past.
We do run valgrind on GNU/Linux, but not MacOS. 99% of the code is
identical between them, but there are a few differences (but none
related to file IO; in fact that's why we're using stdio instead of
open/read/close: for portability--I've considered avoiding this problem
by switching to system calls as there's no real benefit to stdio the way
we use it except for portability).
The problem of course is that even with an optimized build the full
tests take 6+ hours and the failure doesn't always happen, and running
that under valgrind or even a debugging malloc will take significantly
longer (and could potentially change the timing so it never happens).
However, I guess it's up to me now. I'll follow up if I figure out
anything.
Cheers!
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden