Re: Getting mmap's to read in faster
Re: Getting mmap's to read in faster
- Subject: Re: Getting mmap's to read in faster
- From: Terry Lambert <email@hidden>
- Date: Thu, 3 Nov 2005 14:54:24 -0800
On Nov 3, 2005, at 2:10 PM, Dave MacLachlan wrote:
I hope this question is appropriate for this list:
I have a file that I am mmap'ing and jumping around in in sort of a
random walk. But I do manage to hit most of the file. Is there
anyway to encourage the OS to page it in for me, knowing that I'm
going to be using it? The performance I'm getting reading in a
couple of blocks at a time is rather slow, and I see by examining
the Darwin sources that calling madvise with MADV_WILLNEED is
currently useless.
I tried doing a mmap and then immediately calling read on the same
file descriptor that I mmap'd to read in all the blocks, but this
locked up various processes on my system.
If you plan on using memory access to access file data, then it's
going to take running the fault handler to cause the data to be paged
in.
If you want to get this to happen before you get to a page, and you
expect you know the pages you are going to need before you actually
need them ,the easiest method to use is to start a "page in thread"
in a separate context so that the page in's occur asynchronously to
your program needing to access the data.
Otherwise you end up "convoying" your accesses, effectively
serializing the fault operation behind the work you do with the
faulted data, before triggering the next fault.
Personally, I'd queue work items to the page in thread, consisting of
the page that I'm going to ned in the near future, and then the page
in thread just processes the work queue in request order by
dereferencing a single byte immediately following the page boundary.
If you don't want to keep track of page boundaries in your main
application, and are willing to accept some additional overhead
because of that, you could just queue the starting offset, and if the
access doesn't end up triggering a fault because the page is already
there, all you've wasted is some extra CPU time in queueing the work
item and triggering the page in thread to run because it had "work"
to do.
Be aware that you can work against yourself here. For example, if
you attempt to cause the whole file to be paged in as fast as
possible, and this runs you out of buffer cache, pages you *will* be
using soon might get evicted by pages you *say* you'll be using soon-
but-not-that-soon, so don't try to page in your whole address space
all at once this way.
If you don't actually need to mmap() the data, and accessing it via
read buffers will work for you, you'd be better off doing a series of
aio_read() operations instead, since explicit control will get you
better performance than hysteresis between two or more threads.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden