On Jul 3, 2011, at 12:44 PM, Eric Gorr wrote: One can, of course, use fopen, etc. to read a file
I missed the "f" in "fopen" there the first time I read your message. Someone else just pointed that out to me.
Using stdio (fopen, fread, fclose, etc.) can improve performance if you're doing lots of small reads or writes for some reason. But it does so by making extra copies of the file's data. If you're reading the entire file anyway, stdio will only slow you down (compared to open/read/close). If you're processing a large file (one that's too large to read into memory all at once), you're still better off using open/read/close and avoiding stdio's overhead. Just remember to make the reads large enough (but not too large), and keep them aligned. Doing that will save at least two copies of the data (once from the buffer cache into stdio's buffer, and then from stdio's buffer into your program's buffer). And if your program's buffer is page aligned, and you use F_NOCACHE, the file system can read directly from the disk to your buffer and bypass the buffer cache.
I bet that will give you a performance boost.
Also, how are you iterating through the file system hierarchy? (That is, how are you getting the list of file and directory names, and how are you deciding when to recurse?) Some of the performance problem you're seeing may be in iterating the files, as opposed to reading from them. Instruments, fs_usage, sc_usage, and dtrace can all help you investigate where you're spending your time (and therefore what you should be focusing on optimizing).
-Mark
|