Re: How to read files from disk directly?
Re: How to read files from disk directly?
- Subject: Re: How to read files from disk directly?
- From: Eric Gorr <email@hidden>
- Date: Mon, 04 Jul 2011 09:10:58 -0400
On Jul 4, 2011, at 8:38 AM, Toby Thain wrote:
> On 04/07/11 8:16 AM, Eric Gorr wrote:
>>
>> On Jul 3, 2011, at 11:51 PM, Shantonu Sen wrote:
>>
>>> Defragmenting applications use the volume format described as <http://developer.apple.com/library/mac/#technotes/tn/tn1150.html>
>>
>> Thank you.
>>
>>> This is not the same thing as the logical filesystem that is exposed to userspace. For instance, filesystem compression (introduced in Mac OS X 10.6 Snow Leopard) can prevent you from being able to reconstruct a file's contents by accessing the raw device.
>>
>> Interesting. Is the filesystem compression optional or is it always there for everyone?
>>
>> If it was optional, then I assume the situation could be detected and the slow route of using the standard APIs to open and read the files could be the fallback.
>>
>>> What are you really trying to do?
>>
>> It's fairly easy to describe. For each individual file:
>>
>> 1. Open the file
>> 2. Read the data from the file
>> 3. Do some simple processing on that data.
>> 4. Close the file
>>
>> The issue is that there are 1 million + files and the vast majority of them are small (only a few kb). Based on my tests, the overhead of opening and reading the data for every single file is significant and I figured, if it were practical, that it would be nice to be able to read a few hundred of them in one shot and send them off to a worker thread for processing.
>>
>
> Have you considered putting the files on an SSD or RAM disk?
I am not sure that is practical for every single file on a device. There could easily be millions of them and hundreds of gigabytes or even terabytes worth of data. This is going to be slow no matter what...I am simply looking for ways to make it go faster and believe there is a lot of room for improvements that would not be necessarily be horribly difficult to implement. But, even if something was horribly difficult to implement and maintain, it would not necessarily be out-of-the-question.
> Assuming the files have to live on spinning storage, most of the
> difference between streaming and random access is in seek latencies.
> Your idea of streaming the raw filesystem (to RAM, presumably) THEN
> making random-like access to the individual files helps, but the devil
> being in the details, etc, may involve a lot of work. What if you change
> the underlying medium in the first place?
>
> I'm not sure you have told us enough about the bigger picture yet.
I understand the request sounds odd, but the picture really isn't all that big. I need to read and process every file on disk. This is slow. I would like to see if I can figure out a way to make it go faster.
If I could simply determine the order the start of the files appear on the disk and then open and read them in that order, that might provide some performance improvements as it would minimize seek latencies.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden