One can, of course, use fopen, etc. to read a file, but what I need to
do is open and read every file on the disk and to do so as efficiently
as possible. The overhead of using something like fopen, etc. over 1 million + files appears to be rather high and if I could read a few hundred (if they were contiguous on disk) small files as a block for a worker thread to process, I believe I could come closer to maintaining a good rate of transfer from the disk.
So, my thought was to using /dev/rdisk* (?) or /dev/(?) to start with
the files at the beginning of the device. I would do my best to read
the files in order as they appear on the disk, minimize the amount of
seeking across the device since files may be fragmented, and read in
large blocks of data into RAM where it can be processed very quickly.
So, the primary question I have is when reading the data from my
device directly, how can I determine exactly what data belongs with what
file?
I assume I could start by reading a catalog of the files and that
there would be a way to determine the start and stop locations of file
or file fragments on the disk, but I am not sure where to find
information about how to obtain such information...? Is there an API that would tell me where on disk the file or file fragments physically sit? Considering that a defragmentation application needs to do something similar, I am sure what I need to do is possible.
I am running Mac OS X 10.6.x and one can assume a standard setup for
the drive. I might assume the same information would apply to a
standard, read-only, uncompressed .dmg created by Disk Utility as well.
Any information on this topic or articles to read would be of interest.
thank you.