Re: hangs in flockfile() during fread() or fclose()
Re: hangs in flockfile() during fread() or fclose()
- Subject: Re: hangs in flockfile() during fread() or fclose()
- From: Greg Parker <email@hidden>
- Date: Thu, 10 Jul 2014 17:43:16 -0700
On Jul 10, 2014, at 7:30 AM, Paul Smith <email@hidden> wrote:
> On Thu, 2014-07-10 at 08:53 -0400, Paul Smith wrote:
>> However, I guess it's up to me now. I'll follow up if I figure out
>> anything.
>
> Hm. OK, I think I have a smoking gun.
>
> I looked at the core again and it turns out there's ANOTHER thread,
> which is ALSO hung in flockfile(). I didn't notice it the first time.
> So, a real deadlock situation rather than memory stomp.
>
> Examining the other thread, I see a bug in our code: we are writing
> logging and we check to see if the log FILE* is NULL and if so we write
> to stdout instead. We do this properly everywhere except for the
> fflush() operation at the end; here we just pass the log FILE* without
> checking. And of course, if you call fflush(NULL) then it will try to
> flush all open file descriptors, and this is where it hangs:
>
> Thread 7 (core thread 6):
> #0 0x00007fff8a997746 in __psynch_mutexwait ()
> #1 0x00007fff88cd7779 in _pthread_mutex_lock ()
> #2 0x00007fff856c0edd in flockfile ()
> #3 0x00007fff856c156f in sflush_locked ()
> #4 0x00007fff856c3e82 in _fwalk ()
> #5 0x0000000101d9112f in Engine::logger (this=<unavailable, message=<unavailable>) at /Users/build/src/Logger.cpp:887
>
> Looking at the code in libc I _think_ I see the problem:
>
> * In fopen() if we need to get new FILE objects it appears to me
> that they are added into the list _before_ they are completely
> initialized; in particular before the INITEXTRA() macro, which
> is what initializes the _fl_lock mutex, is run.
> * In _fwalk() we walk the list of open FILE objects without taking
> a lock; the comment says:
> * It should be safe to walk the list without locking it;
> * new nodes are only added to the end and none are ever
> * removed.
>
> Then _fwalk() passes the FILE object to sflush_locked() which calls
> flockfile() on it.
>
> So I believe that it's possible for one thread to be calling flockfile()
> on a FILE object with an uninitialized _fl_lock mutex, leading to
> corruption.
>
> I will change our code, since we definitely don't want to be calling
> fflush(NULL), but it seems also to be a problem in libc since this kind
> of thing is exactly why flockfile() is there, IIUC.
>
> Am I understanding the code correctly, or did I miss something? Should
> I file a bug? If so where's the best place?
Your diagnosis sounds plausible to me. That code needs appropriate memory barriers if it wants to play games with lock-free algorithms, and I don't see any. Please file a bug report from http://bugreport.apple.com.
--
Greg Parker email@hidden Runtime Wrangler
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden