Re: How to read files from disk directly?
Re: How to read files from disk directly?
- Subject: Re: How to read files from disk directly?
- From: Gregg Wonderly <email@hidden>
- Date: Mon, 04 Jul 2011 10:55:23 -0500
On Jul 4, 2011, at 7:16 AM, Eric Gorr wrote:
>
> On Jul 3, 2011, at 11:51 PM, Shantonu Sen wrote:
>
>> Defragmenting applications use the volume format described as <http://developer.apple.com/library/mac/#technotes/tn/tn1150.html>
>
> Thank you.
>
>> This is not the same thing as the logical filesystem that is exposed to userspace. For instance, filesystem compression (introduced in Mac OS X 10.6 Snow Leopard) can prevent you from being able to reconstruct a file's contents by accessing the raw device.
>
> Interesting. Is the filesystem compression optional or is it always there for everyone?
>
> If it was optional, then I assume the situation could be detected and the slow route of using the standard APIs to open and read the files could be the fallback.
>
>> What are you really trying to do?
>
> It's fairly easy to describe. For each individual file:
>
> 1. Open the file
> 2. Read the data from the file
> 3. Do some simple processing on that data.
> 4. Close the file
>
> The issue is that there are 1 million + files and the vast majority of them are small (only a few kb). Based on my tests, the overhead of opening and reading the data for every single file is significant and I figured, if it were practical, that it would be nice to be able to read a few hundred of them in one shot and send them off to a worker thread for processing.
>
> But, it sounds like what you are saying is that this is impossible or at least so impractical that it should not be attempted.
One of the things that I'd be tempted to investigate is disk head scheduling under high load. Does HFS utilize a scheduler that might help mitigate some of the "seek" issues if you have enough outstanding requests that the head motion and read buffering can be optimized in the kernel, for you?
If so, try a work queue with 1, 5, 10, 25, 50, 100, 200 threads and see if a simple opendir/readdir traversal with work items fed to such a queue might provide a reasonable kernel based optimization stream.
Gregg _______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden