Re: Getting mmap's to read in faster
Re: Getting mmap's to read in faster
- Subject: Re: Getting mmap's to read in faster
- From: Terry Lambert <email@hidden>
- Date: Fri, 4 Nov 2005 16:22:14 -0800
On Nov 4, 2005, at 10:11 AM, Mike Smith wrote:
On Nov 3, 2005, at 2:54 PM, Terry Lambert wrote:
On Nov 3, 2005, at 2:10 PM, Dave MacLachlan wrote:
I hope this question is appropriate for this list:
I have a file that I am mmap'ing and jumping around in in sort of
a random walk. But I do manage to hit most of the file. Is there
anyway to encourage the OS to page it in for me, knowing that I'm
going to be using it? The performance I'm getting reading in a
couple of blocks at a time is rather slow, and I see by examining
the Darwin sources that calling madvise with MADV_WILLNEED is
currently useless.
I tried doing a mmap and then immediately calling read on the
same file descriptor that I mmap'd to read in all the blocks, but
this locked up various processes on my system.
If you plan on using memory access to access file data, then it's
going to take running the fault handler to cause the data to be
paged in.
If you want to get this to happen before you get to a page, and
you expect you know the pages you are going to need before you
actually need them ,the easiest method to use is to start a "page
in thread" in a separate context so that the page in's occur
asynchronously to your program needing to access the data.
This is a bit baroque.
Dave,
If you're going to jump around randomly in the whole file, and you
want the whole thing paged in, just allocate a chunk of memory and
use read() to pull the whole thing in. It will have the same net
effect on the system, and the single read() will give the system
enough of a hint to bring the file in efficiently.
If the file is too big to bring in wholesale, then you don't want
to be prefetching it in the mmap case.
PS up front: The simplifications here are directed to the list, not
toward Mike; I just want to make sure that readers understand the
whys of these things...
-
Mike: he already compared the read vs. mmap in the initial posting,
and still wants mmap (maybe to avoid changing a lot of already
written code).
-
I think the point is to have the data available when you go to use
it, rather than waiting for it to come in from a read before you can
start working on it.
This is clearly a performance issue.
There's not a lot of difference in total latency when you interleave
read/work requests, vs. reading everything up front; e.g., it's:
read.0 work.0 read.1 work.1 read.2 work.2 ...
vs:
read.0 read.1 read.2 work.0 work.1 work.2 ...
Even if he read everything in (assuming no other resource constraints
got in the way of doing that), it's going to be worthwhile to
increase the concurrency of the work by using aio_read instead of
read to cause the operations to occur in parallel (read requests take
little CPU; most of their life is spent in I/O wait, which means it
leaves the CPU free to do work while waiting on the next I/O, e.g.
the optimal solution is:
read.0 read.1 read.2 ...
work.0 work.1 work.2 ...
...in the interests of amortizing a single latency across all work
items, instead o incurring one latency per work item and taking the
performance hit, with the read and work items being small enough that
the latency from a single read is in the noise.
A lot of people these days are uncomfortable with async programming/
finite state automatons, so it's a lot easier to split the work on
thread boundaries. This adds a communications latency, but if you
queue the next read request before you start the work for the
previously satisfied read request, you amortize one of those
latencies across the whole list, e.g.:
read.0 read.1 read.2 ...
work.0 work.1 work.2 ...
vs.
comm.0 comm.1 comm.2 ...
read.0 read.1 read.2 ...
work.0 work.1 work.2 ...
...with the comm overhead being not to scale (it's much much
smaller). Eliminating even this small an added latency was why I
suggested async I/O as an alternative solution, if he can live
without it being mmap'ed into his address space, but it's probably
"good enough" for the simplification of the programming model that
threads give you (i.e. program everything procedurally as linear run-
to-completetion operations, instead of separating out your state
information from your code and using an automaton).
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden