Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Getting mmap's to read in faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Getting mmap's to read in faster

Subject: Re: Getting mmap's to read in faster
From: Terry Lambert <email@hidden>
Date: Fri, 4 Nov 2005 16:22:14 -0800

On Nov 4, 2005, at 10:11 AM, Mike Smith wrote:

On Nov 3, 2005, at 2:54 PM, Terry Lambert wrote:
On Nov 3, 2005, at 2:10 PM, Dave MacLachlan wrote:
I hope this question is appropriate for this list:
I have a file that I am mmap'ing and jumping around in in sort of a random walk. But I do manage to hit most of the file. Is there anyway to encourage the OS to page it in for me, knowing that I'm going to be using it? The performance I'm getting reading in a couple of blocks at a time is rather slow, and I see by examining the Darwin sources that calling madvise with MADV_WILLNEED is currently useless.

I tried doing a mmap and then immediately calling read on the same file descriptor that I mmap'd to read in all the blocks, but this locked up various processes on my system.
If you plan on using memory access to access file data, then it's going to take running the fault handler to cause the data to be paged in.

If you want to get this to happen before you get to a page, and you expect you know the pages you are going to need before you actually need them ,the easiest method to use is to start a "page in thread" in a separate context so that the page in's occur asynchronously to your program needing to access the data.
This is a bit baroque.
Dave,
If you're going to jump around randomly in the whole file, and you want the whole thing paged in, just allocate a chunk of memory and use read() to pull the whole thing in. It will have the same net effect on the system, and the single read() will give the system enough of a hint to bring the file in efficiently.

If the file is too big to bring in wholesale, then you don't want to be prefetching it in the mmap case.

PS up front: The simplifications here are directed to the list, not toward Mike; I just want to make sure that readers understand the whys of these things...

Mike: he already compared the read vs. mmap in the initial posting, and still wants mmap (maybe to avoid changing a lot of already written code).

I think the point is to have the data available when you go to use it, rather than waiting for it to come in from a read before you can start working on it.

This is clearly a performance issue.

There's not a lot of difference in total latency when you interleave read/work requests, vs. reading everything up front; e.g., it's:

    read.0 work.0 read.1 work.1 read.2 work.2 ...

vs:

    read.0 read.1 read.2 work.0 work.1 work.2 ...

Even if he read everything in (assuming no other resource constraints got in the way of doing that), it's going to be worthwhile to increase the concurrency of the work by using aio_read instead of read to cause the operations to occur in parallel (read requests take little CPU; most of their life is spent in I/O wait, which means it leaves the CPU free to do work while waiting on the next I/O, e.g. the optimal solution is:

    read.0 read.1 read.2 ...
           work.0 work.1 work.2 ...

...in the interests of amortizing a single latency across all work items, instead o incurring one latency per work item and taking the performance hit, with the read and work items being small enough that the latency from a single read is in the noise.

A lot of people these days are uncomfortable with async programming/ finite state automatons, so it's a lot easier to split the work on thread boundaries. This adds a communications latency, but if you queue the next read request before you start the work for the previously satisfied read request, you amortize one of those latencies across the whole list, e.g.:

    read.0 read.1 read.2 ...
           work.0 work.1 work.2 ...

vs.

    comm.0 comm.1 comm.2 ...
           read.0 read.1 read.2 ...
                  work.0 work.1 work.2 ...

...with the comm overhead being not to scale (it's much much smaller). Eliminating even this small an added latency was why I suggested async I/O as an alternative solution, if he can live without it being mmap'ed into his address space, but it's probably "good enough" for the simplification of the programming model that threads give you (i.e. program everything procedurally as linear run- to-completetion operations, instead of separating out your state information from your code and using an automaton).

-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: Getting mmap's to read in faster
From: Mike Smith <email@hidden>


References:  
  >Getting mmap's to read in faster (From: Dave MacLachlan <email@hidden>)
  >Re: Getting mmap's to read in faster (From: Terry Lambert <email@hidden>)




Prev by Date:
dyld, debug and gdb

Next by Date:
Re: Getting mmap's to read in faster

Previous by thread:
Re: Getting mmap's to read in faster

Next by thread:
Re: Getting mmap's to read in faster

Index(es):

Date
Thread