Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Get the underlying cache block size of a filesystem?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Get the underlying cache block size of a filesystem?

Subject: Get the underlying cache block size of a filesystem?
From: James Bucanek <email@hidden>
Date: Thu, 11 Feb 2010 09:20:28 -0700

Greetings,

Disclaimer: I'm a big propionate of "don't fight the OS" and "don't reinvent the wheel", but sometimes you just have to. My application presents a pathological case that is discordant with the filesystem's natural caching algorithm, and I'm trying to find a more efficient solution.

Scenario: I have a very large index file (GBs) that has millions of very small records (12 bytes each). Records are read stochastically while parsing an incoming stream of data. You can guess what happens: (1) an entire of block of data is read from the disk and cached, (2) 12 bytes are extracted, (3) the block eventually gets pushed out of the cache by other file activity, process repeats.

The result is that almost every record read incurs the overhead of reading the 4K/8K/32K/whatever block of data from the physical media. This effectively "overreads" hundreds of potentially interesting records in the same block, discards them, and reads them again next time.

So the standard wisdom of "let the OS cache the data" isn't working here.

I've *significantly* improved performance by creating my own in-memory cache records. When I read a record, I calculate the 4K region of the file that the record resides in, read that 4K span, and then copy all "interesting" records (typically 10-20% of block) into a compact cache. The next time a record in that range is requested, I can satisfy the request from the cache--until it fills up, but that's rare.

This new technique is almost 100 times faster than the old one, but I'd like to make it as efficient as possible. I realize that my app could still be repeatedly reading the same block of data if the OS always reads 16K or 32K at a time and I only cache 4K of that. On the other hand, I don't want to arbitrarily increase this value; reading too much at a time slows down individual requests and causes the cache to fill too quickly. And I realize that on a networked file system, the buffering size might be considerably smaller and reading more just wastes bandwidth.

So I come to beg the gurus of filesystem architecture for advice. Is there an API that I can use to discover the actual/typical block read size employed by a filesystem or filesystem cache? I've looked at the various Carbon functions and things like fcntl(), but can't find anything (obvious). Alternatively, is there a constant that I could reasonably assume to be close to the actual read-block size in most situations? I'm targeting OS X 10.4-10.6, although optimal support for 10.4 isn't critical.

Thanks!
--
James Bucanek

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: Get the underlying cache block size of a filesystem?
From: James Peach <email@hidden>
Re: Get the underlying cache block size of a filesystem?
From: Toby Thain <email@hidden>


Next by Date:
Re: Get the underlying cache block size of a filesystem?

Next by thread:
Re: Get the underlying cache block size of a filesystem?

Index(es):

Date
Thread