memory management question
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com Hi all - 1) register region of memory to send from / receive to 2) perform send/recv 3) unregister region of memory Thanks, Brian -- Brian Barrett Graduate Student, Open Systems Lab, Indiana University http://www.osl.iu.edu/~brbarret/ _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... I have a question about Darwin's memory manager I hope someone can help with. I'm working on Open MPI, an implementation of the Message Passing Interface for high performance computing. Basically, a nice, portable interface for writing high performance parallel codes. We support a number of high speed networks including InfiniBand (IB) and Myrinet, both of which use OS bypass remote memory access for better performance. The problem I'm running into is with "registering" memory with the communication library. Essentially, both IB and the Myrinet software (also known as gm) have a usage similar to: The registration serves both the purpose of allowing the communication library (IB or GM) to figure out all the virtual to physical memory address translations and to pin/wire the pages so that they don't move during the registration period. Registration costs are around 10us and deregistration costs are around 200us for a single page and the cost scale is sub-linear but not constant as registration sizes increase. For a network with sub-5us end to end communication cost, this is too high for most uses. Due to the high registration/deregistration costs and given that most HPC applications have a high locality, it would be really nice to cache the registrations so that we only unregister memory when either the application is done or some threshold of # of registered pages is reached. The problem with this is that the user is free to do something like: 1) malloc out a chunk of memory 2) call MPI_Send out of said buffer (which means MPI registered, sends, but doesn't unpin because we're caching) 3) calls free on the chunk of memory which means the user buffer is being returned to the OS while the communication library thinks the memory is still pinned. From this point, bad things usually happen - exactly what depends on the OS. I believe Darwin will block the thread, waiting for the memory to be unpinned before free() returns. Which basically means free() isn't going to return. On Linux, we provide an override for the weak symbols for munmap and friends, and use either the glibc malloc hooks or an ldpreload to intercept free() and realloc() so that we can unregister any pinned memory before we let the OS actually free it. Ideally, rather than intercepting free() and realloc(), we'd like some callback / notification / etc. that memory was actually being given back to the OS. That way, we don't have to go through our registration LRU lists and all that So the actual question to the list - what is the best way to do this on Darwin? Currently, we do something really evil, similar to what MPICH-gm (the Myricom distributed MPI for Myrinet) does. We copied scalable_malloc.c out of the Darwin libSystem sources and added a hook to deregister our memory wherever vm_deallocate() is called. MPI applications are generally compiled with "wrapper compilers" that get all the right CFLAGS, LDFLAGS, LIBS, etc for that particular MPI implementation, so we can add the right linker flags to make sure that our version of scalable_malloc.o is used instead of the one in libSystem. This, of course, is pretty fragile - if Apple changes scalable_malloc.c, we have to adjust to that. We have to force a flat namespace for the user application, etc. So, is there a better way to get notification that memory is about to leave the process and give us an opportunity to deregister the memory (if needed)? This email sent to site_archiver@lists.apple.com
participants (1)
-
Brian Barrett