memory management question
memory management question
- Subject: memory management question
- From: Brian Barrett <email@hidden>
- Date: Wed, 17 Aug 2005 11:31:36 -0500
Hi all -
I have a question about Darwin's memory manager I hope someone can
help with. I'm working on Open MPI, an implementation of the Message
Passing Interface for high performance computing. Basically, a nice,
portable interface for writing high performance parallel codes. We
support a number of high speed networks including InfiniBand (IB) and
Myrinet, both of which use OS bypass remote memory access for better
performance.
The problem I'm running into is with "registering" memory with the
communication library. Essentially, both IB and the Myrinet software
(also known as gm) have a usage similar to:
1) register region of memory to send from / receive to
2) perform send/recv
3) unregister region of memory
The registration serves both the purpose of allowing the
communication library (IB or GM) to figure out all the virtual to
physical memory address translations and to pin/wire the pages so
that they don't move during the registration period. Registration
costs are around 10us and deregistration costs are around 200us for a
single page and the cost scale is sub-linear but not constant as
registration sizes increase. For a network with sub-5us end to end
communication cost, this is too high for most uses.
Due to the high registration/deregistration costs and given that most
HPC applications have a high locality, it would be really nice to
cache the registrations so that we only unregister memory when either
the application is done or some threshold of # of registered pages is
reached. The problem with this is that the user is free to do
something like:
1) malloc out a chunk of memory
2) call MPI_Send out of said buffer (which means MPI registered,
sends,
but doesn't unpin because we're caching)
3) calls free on the chunk of memory
which means the user buffer is being returned to the OS while the
communication library thinks the memory is still pinned. From this
point, bad things usually happen - exactly what depends on the OS. I
believe Darwin will block the thread, waiting for the memory to be
unpinned before free() returns. Which basically means free() isn't
going to return.
On Linux, we provide an override for the weak symbols for munmap and
friends, and use either the glibc malloc hooks or an ldpreload to
intercept free() and realloc() so that we can unregister any pinned
memory before we let the OS actually free it. Ideally, rather than
intercepting free() and realloc(), we'd like some callback /
notification / etc. that memory was actually being given back to the
OS. That way, we don't have to go through our registration LRU lists
and all that
So the actual question to the list - what is the best way to do this
on Darwin? Currently, we do something really evil, similar to what
MPICH-gm (the Myricom distributed MPI for Myrinet) does. We copied
scalable_malloc.c out of the Darwin libSystem sources and added a
hook to deregister our memory wherever vm_deallocate() is called.
MPI applications are generally compiled with "wrapper compilers" that
get all the right CFLAGS, LDFLAGS, LIBS, etc for that particular MPI
implementation, so we can add the right linker flags to make sure
that our version of scalable_malloc.o is used instead of the one in
libSystem. This, of course, is pretty fragile - if Apple changes
scalable_malloc.c, we have to adjust to that. We have to force a
flat namespace for the user application, etc. So, is there a better
way to get notification that memory is about to leave the process and
give us an opportunity to deregister the memory (if needed)?
Thanks,
Brian
--
Brian Barrett
Graduate Student, Open Systems Lab, Indiana University
http://www.osl.iu.edu/~brbarret/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden