Re: All threads in app periodically blocked
Re: All threads in app periodically blocked
- Subject: Re: All threads in app periodically blocked
- From: Seth Willits <email@hidden>
- Date: Mon, 23 May 2016 11:37:09 -0700
Holy cow!!
I figured out why process was halting every 150 ms.
I happened to find the critical clue by chance today. I was trying Ken's suggestion of explicitly using NSActivity to avoid any potential app napping (which wasn't what was happening), and for only then did the application suddenly stop half way through the data processing and claim it couldn't allocate anything.... Weird. 🤔
Because this CVPixelBufferPool wouldn't allocate anymore pixel buffers, I figured there must be a leak somewhere, so I profiled with Instruments for the 20th time, but yet the Allocations were pretty constant and low, and there were no leaks... Weird. 😣
So then out of habit, I printed out the CVPixelBufferPool in the debugger and *luckily* (since there's no public API for this), the pool's description showed "inUse=3768". 😳 There should be like 20 max. And if there are 3768 PBs, then surely I should see that in Instruments! … Weird.
It quickly occurred to me that the CVPixelBuffers are actually IOSurface-backed, not normal malloced-memory backed, so then I Snapshotted the VM address space a bazillion times and sure enough that was every increasing up to like 20 GB 😱, and all of that growth was in IOKit. So yep, it's these IOSurfaces which are allocated by the system, not by my app, which is why they didn't show up in the normal Allocations track, and because they're in a pool, there actually is no leak. But how does changing my operation from sync to "async" (yet, using the exact same "work" code) make these IOSurfaces stack up? ... Weird. 🤔
My thought at this point was maybe it was some autorelease pool nonsense. These pixel buffers do get inserted into autorelease pools at various points, so I figured maybe there was a pool draining difference because I've been bitten by that before with GCD in past OS versions. So I plastered @autorelease{...} all over the place to see if it had any effect. It didn't. 😕
So then I figured, the ONLY other logical explanation is that this NSOperation *itself* isn't being deallocated, since it keeps a reference (chain) to this CVPixelBuffer, so if the NSOp doesn't deallocate, then the PB doesn't get returned to the pool. So I stuck I breakpoint in -dealloc and sure enough, the operation doesn't deallocated until *waaaaaay* later. Multiple seconds after it and several thousand other operations are done too. *Weird*
Interestingly, the stack frame that triggered the deallocation of this NSOperation was KVO related. That seemed a bit odd to me, but got me thinking that the only real difference between a synchronous NSOp and async NSOp is the KVO notifications that need to be fired, so I opened up the code again to look at that and BOOM. I saw a bug. The very first thing I did was:
- (void)start
{
[self willChangeValueForKey:@"isExecuting"];
...
[self willChangeValueForKey:@"isExecuting"];
I had mistakenly (thanks to copy & paste 😔) called willChange twice and not called didChange. As soon as I fixed that, everything was working fine. The operation was properly deallocated, the PBs were properly returned to the pool, and the IOSurfaces properly released.
🙌
I'm not sure of the details, but as far as I can tell, the system periodically halted my entire process while it scrambled to keep up with my insane demands of constantly allocating gigabytes of IOSurfaces.
I'm telling you guys I used every skill I could think of, staring at zillions of system calls in traces etc, and in the end all I needed was a little luck and it was all due to a simple copy & paste error.
Sigh.
Thanks,
--
Seth Willits
> On May 22, 2016, at 10:48 AM, Seth Willits <email@hidden> wrote:
>
> I'm thoroughly confused and increasingly desperate. *All* of the threads in my application periodically are blocked for 30-50 ms pretty regularly, about every 150 ms.
>
> I'm processing a bunch of data via NSOperations running in a couple of "serial" (maxConcurrentOperationCount = 1) NSOperationQueues. Each of the operations is itself synchronous (overriding -main). Everything works as expected. For performance reasons, I want to move one of the operation types to being asynchronous and eventually allow two concurrent operations.
>
> So to start with, I took the synchronous code in -main, moved it into -start, and added the appropriate KVO notifications for isExecuting, isFinished, and implemented the right accessors etc. So effectively the operation says it's asynchronous, but all of its work is actually done in -start, followed by notifying it's finished. The work is all done correctly, but I noticed it took significantly longer to complete.
>
> Thankfully I have a nifty bit of debugging code which tracks the start and ends of each of these operations and visualizes them, and this is what I noticed:
> http://www.sethwillits.com/temp/upshot/upshot_yJutoKqY.png
>
> At very regular intervals everything takes longer.
>
> First I thought it maybe didn't like my semi-async op, so I made it truly async, and it behaved exactly the same. To make a long story short, I looked at this a few different ways in Instruments, and have confirmed that what's actually happening is *every thread is blocked* for those 30-50ms, and then it just wakes up and everything continues. The threads are literally all in "Blocked" states.
>
>
> I can't figure out *why* everything is blocked. I can't figure out why simply making the NSOperation asynchronous would have this crazy effect. It's 100% reproducible.
>
> For what it's worth, this one particular operation is doing some Metal rendering, and that operation is blocked during a call to -waitUntilCompleted. Which yes, sound suspicious, but is exactly what's happening in the very-explicitly-synchronous version of the operation, and it takes 10x less time. I am pretty certain this has nothing to do with the Metal rendering itself, and has more to do with something confusing the dispatch manager and every thread is being blocked.
>
>
> Can anyone think of what I should look for to figure out *why* they're blocked? Looking at the system calls in the Instruments trace timeline just isn't telling *me* anything.
>
>
> --
> Seth Willits
>
>
>
>
> _______________________________________________
>
> Cocoa-dev mailing list (email@hidden)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden