Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Getting mmap's to read in faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Getting mmap's to read in faster

Subject: Re: Getting mmap's to read in faster
From: Terry Lambert <email@hidden>
Date: Thu, 3 Nov 2005 14:54:24 -0800

On Nov 3, 2005, at 2:10 PM, Dave MacLachlan wrote:

I hope this question is appropriate for this list:
I have a file that I am mmap'ing and jumping around in in sort of a random walk. But I do manage to hit most of the file. Is there anyway to encourage the OS to page it in for me, knowing that I'm going to be using it? The performance I'm getting reading in a couple of blocks at a time is rather slow, and I see by examining the Darwin sources that calling madvise with MADV_WILLNEED is currently useless.

I tried doing a mmap and then immediately calling read on the same file descriptor that I mmap'd to read in all the blocks, but this locked up various processes on my system.

If you plan on using memory access to access file data, then it's going to take running the fault handler to cause the data to be paged in.

If you want to get this to happen before you get to a page, and you expect you know the pages you are going to need before you actually need them ,the easiest method to use is to start a "page in thread" in a separate context so that the page in's occur asynchronously to your program needing to access the data.

Otherwise you end up "convoying" your accesses, effectively serializing the fault operation behind the work you do with the faulted data, before triggering the next fault.

Personally, I'd queue work items to the page in thread, consisting of the page that I'm going to ned in the near future, and then the page in thread just processes the work queue in request order by dereferencing a single byte immediately following the page boundary.

If you don't want to keep track of page boundaries in your main application, and are willing to accept some additional overhead because of that, you could just queue the starting offset, and if the access doesn't end up triggering a fault because the page is already there, all you've wasted is some extra CPU time in queueing the work item and triggering the page in thread to run because it had "work" to do.

Be aware that you can work against yourself here. For example, if you attempt to cause the whole file to be paged in as fast as possible, and this runs you out of buffer cache, pages you *will* be using soon might get evicted by pages you *say* you'll be using soon- but-not-that-soon, so don't try to page in your whole address space all at once this way.

If you don't actually need to mmap() the data, and accessing it via read buffers will work for you, you'd be better off doing a series of aio_read() operations instead, since explicit control will get you better performance than hysteresis between two or more threads.

-- Terry

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Getting mmap's to read in faster (From: Dave MacLachlan <email@hidden>)




Prev by Date:
Discover local address from socketfilter

Next by Date:
dyld, debug and gdb

Previous by thread:
Getting mmap's to read in faster

Next by thread:
Re: Getting mmap's to read in faster

Index(es):

Date
Thread