Mailing Lists: Apple Mailing Lists
Image of Mac OS face in stamp
Re: Zero Fill VM Faults and poor Multi-CPU performance...
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Zero Fill VM Faults and poor Multi-CPU performance...



As a simple fix, you could drop in dlmalloc as a replacement for Apple's malloc. It has extremely good performance and doesn't fall off a cliff at 16K like Apple's malloc does. Google also wrote an open-source allocator which is designed for multi-thread apps—apparently it is awesome if your app uses threads everywhere—but I haven't personally tried it.

We've seen similar problems as well—e.g. FMOD 4 has a nasty habit of allocating 17K and 33K buffers. It's really a bummer that Apple's cutoff for malloc falling back to VM is so tiny. Its small- and medium-size malloc algorithm is like a rocket but the large allocations suffer big time unless you implement your own recycler (tough to do in a fully thread-safe way) or just replace malloc entirely.


Dave Thorup wrote:
We've been investigating some performance problems with an application that we're working on and have found that memory performance - allocations & deallocations above 15 kb - is extremely poor in a multi-threaded environment - one thread per CPU. The problem appears to be that when doing work that requires a lot of memory allocation and deallocation that threads will become blocked as a result of Zero Fill VM Faults.

We use a pool of threads (one thread per CPU) where each thread is given a load of work to do. This unit of work is independent from any other thread and can be completed without blocking other threads - i.e. there should be no resource contention. There is, however, a large amount of memory usage required to complete each unit of work. This is where the problem arises. Using Shark we found that a large amount of time was being spent blocking because of Zero Fill VM Faults. An All Thread States Time Profile showed an unusually large amount of time being spent in memcpy while another System Trace Profile showed that most of the time attributed to memcpy was a result of Zero Fill VM Faults. This results in a 15-20% performance hit for our program on the Mac.

On other platforms (Windows & Linux) the same hardware (8 Core Mac Pro) running our program can fully saturate each of the 8 CPUs - top usage of around 785-790% and almost 0% idle. On the Mac (10.5.2) however, we can only get about 680% CPU usage with about 14% idle. The problem appears to be a direct result of blocking caused by Zero Fill VM Faults.

I've written a small test program that illustrates the problem and shows that it can actually be far worse that what we are seeing in our application. Basically all it does is launch a thread for each CPU that does the following:

char * pData;

while ( true )
{
    pData = (char *)malloc( nSize );
    memset( pData, 5, nSize );
    free( pData );
}

It's basically the worst case scenario that you could give to the VM system. All it does is repeatedly allocate a block of memory, memset it to make sure it is actually touched and then free it. If nSize is 14 KB (1024 * 14) or less then this little test app will get full CPU utilization - near 800%. This is because memory allocations within this size range are done using a heap - the memory is only Zero Filled once. When memory is returned to the heap and then reallocated it is not Zero Filled again since this has already been done. The problem arises when nSize is set to 15 KB or greater. When using larger memory sizes CPU utilization drops from near 800% to around 230%. Yes, barely over 2 CPUs can be kept busy when doing lots of memory allocations greater than 15 KB. This seems to be very bad when trying to scale performance on machines with lots of CPUs. On a 4 CPU Mac Pro this app will only get to about 180% in top. So going from 4 to 8 CPUs only gives you a change of 180% to 230%, that's clearly not scaling well and not even coming close to reaching each machine's potential.

The problem according to what I've read is that large memory allocations (15 KB & up) are done using vm_allocate which will always Zero Fill the new memory. This seems very bad to me. Why always Zero Fill? Why not only Zero Fill when the actually memory page was last used by another process? If the pages were last used by the requesting process then there's no reason to Zero Fill them.

Another interesting observation that I've found is that there also appears to be some throttling going on - at least that's the best way I can explain it. If I modify my test app to only create one thread and then run 8 instances of the application then CPU utilization increases from 230% to 620%. This shows that the VM system is clearly capable of working faster, but is limiting itself when it is stressed by a single process. It's still not maxing out all 8 CPUs but it is clearly capable of performing better when more applications are using memory than if one application with multiple threads is accessing memory.

The only way that I can see to work around this is for us to create our own memory allocator so that when we free memory up we don't return it to the VM system. That really seems like overkill. Does anyone else have any suggestions? I'll certainly be filing this as a bug report but in the interim it certainly appears that applications designed to scale well across large numbers of CPUs will not scale well on the Mac as long as they require frequent memory allocation and deallocation.

Thanks!
--
Dave Thorup
Software Engineer
http://bibblelabs.com

_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:


This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Zero Fill VM Faults and poor Multi-CPU performance... (From: Dave Thorup <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2011 Apple Inc. All rights reserved.