Re: Speed-optimized file copy
Re: Speed-optimized file copy
- Subject: Re: Speed-optimized file copy
- From: "Quinn \"The Eskimo!\"" <email@hidden>
- Date: Wed, 20 Jul 2011 11:55:24 +0100
On 19 Jul 2011, at 18:38, Wim Lewis wrote:
> I haven't tested on Darwin, but on some systems you can get a significant efficiency improvement using the aio_* interface (POSIX asynchronous IO).
AIO is unlikely to help you here; if you look at the Darwin implementation, you'll find that it just uses a bunch of kernel threads to run the I/O via the normal mechanism.
> Not only does this let the kernel see multiple outstanding requests and retire them out-of-order [...]
The notion of passing lots of requests to the kernel and have it deal with optimising them is an interesting one. This works really well in some circumstances, but it's probably not the best choice here. Specifically, this approach works well when you're doing lots of scattered I/O, or where the I/O is coming from multiple unrelated processes. So, if you're doing an Xcode build and running Spotlight and running Time Machine, the kernel's ability to get a global view of the world is a good thing. However, in this case the Stevo just wants to grind through data as quickly as possible, and I'm presuming that these files are going to be relatively large (it's have to "total many gigabytes" with large files). At that point you know more about the problem space than the kernel does, and can optimise accordingly.
On 19 Jul 2011, at 22:25, Stevo Brock wrote:
> -I had heard recently on one of the developer lists (I think that's where it was) that valloc eventually called down to vm_allocate so I was just taking a short cut. I've gone back to now using valloc.
Cool. The valloc vs vm_allocate trade off is another interesting one, but in your case the trick is to avoid playing in this space at all. If you allocate your buffers and keep them around, you're only doing a few allocations for the entire copy, at which point the relatively performance characteristics of these routines is irrelevant (and you might as well use the higher-level one).
> -The buffer size and read size (and write size) is currently 1MB. Obviously testing can illuminate, but in your experience, is this a good ballpark number?
Yes (assuming the two spindles case that I described).
> -Optimizing for seeks. Yes, this is the tricky one. I can tell you that the environment that this application will be used in is always going from a source drive to a number of (usually 1 or 2) destination drives. Is it even possible to know what spindle a file is being written to?
Mapping files to spindles is tricky in the general. A good first cut is to look at the dev node name; on most systems disk0s1 is on the same spindle as disk0s2, but on a different spindle than disk1s1. However, this doesn't always work. Things that can confuse it include:
o RAID (where the very notion of spindles breaks down)
o disk images
o logical volume managers
If you'd like to see the OS take care of these nitty gritty details, please file an enhancement request.
<http://developer.apple.com/bugreporter/>
On 19 Jul 2011, at 22:32, Stevo Brock wrote:
> On OS X 10.6 (and let's throw in 10.7), if the kernel sees a number of read() and write() calls coming in from different threads, does it automatically reorder them in any way, or does it always service them in order?
All versions of Mac OS X can complete I/O out of order (for example, the data for the first read might be in the cache), but recent versions of Mac OS X have complex I/O reordering for optimisation purposes. Alas, I can't remember which version that got added in )-:
On 20 Jul 2011, at 01:53, Stevo Brock wrote:
> And here's another interesting question - everything we've been talking about so far has assumed spinning media - how do things change if the source and/or destination is Flash-based?
You're correct that everything changes with flash. Most notably, seek times disappear, so that optimisation is irrelevant. My understanding is that the critical issue with flash is its write performance, and the key optimisation is to avoid read/modify/write cycles (flash uses a large block size, so writing a single sector will cause such a cycle). Fortunately large writes are also good for hard disks, so that shouldn't be a problem.
In summary, I think that most of the optimisations you make for hard disks are either benign or beneficial on flash, so as long as you don't spend too much time applying the optimisation you should be OK.
S+E
--
Quinn "The Eskimo!" <http://www.apple.com/developer/>
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden