Re: Get the underlying cache block size of a filesystem?
Re: Get the underlying cache block size of a filesystem?
- Subject: Re: Get the underlying cache block size of a filesystem?
- From: James Bucanek <email@hidden>
- Date: Thu, 11 Feb 2010 12:55:59 -0700
James Peach <mailto:email@hidden> wrote (Thursday, February
11, 2010 11:13 AM -0800):
Probably your best bet is to use f_iosize (or f_bsize) from struct statfs, see statfs(2).
Dominic Giampaolo <mailto:email@hidden> wrote (Thursday,
February 11, 2010 10:44 AM -0800):
Programmatically you can use statfs() to find out info
about the block size of the file system. The f_bsize
field is the one to look at.
Dominic,
James,
<slapping forehead>Duh!</slapping forehead>
If it was a snake, it would have bitten me. :)
I did remember looking at f_bsize, but assumed that this was the
size of sectors on the physical media and not necessarily the
amount of data the OS would read each time. Regardless, these
are exactly the numbers I'm looking for.
You can observe exactly what the OS is doing by running
"sudo fs_usage -w" and examining the output. That will
show you exactly which i/o's get issued to the disk.
That was my fall-back/hacker solution if I couldn't get a
definitive answer from the OS.
Unless someone explicitly formatted it otherwise, the
default HFS+ block size is 4k. If your app is doing
random i/o the cluster layer will detect this and it
will not issue any larger i/o's (in comparison to when
it detects you're doing sequential i/o which causes it
to issue read-ahead). So if you examine the i/o's generated
while running your app, I would expect to see a series of
effectively random 4k i/o's spread all
over the disk blocks contained by your db file. I
believe this is mostly what you want. What you can do
to be even more effective though is to use the F_NOCACHE
fcntl() to tell the OS to not bother caching the data
since you have your own cache. This will reduce the
impact your app has on the rest of the system and in
effect give you more memory for your own cache (which
is going to be more effective).
I'm using the Carbon API to read the file; I am passing
noCacheMask to FSReadFork().
An explanation of the caching behavior is very informative, and
might explain some of the performance problems I was having,
which I'll expand on after I describe the application...
BTW, your app sounds kind of interesting: what sort of
data are you manipulating that has 12 byte records and
hundreds of millions of items? (if you can talk about
it of course - I certainly don't want to know any kind
of confidential info).
It's public knowledge: <http://www.qrecall.com/>
QRecall is a backup solution that takes a database-like approach
to incremental backups. QRecall breaks every file up into
individual data blocks and adds those to a massive database
(archive). The key advantage is that identical blocks are never
duplicated in the archive. Each block added is first compared to
the millions of blocks already in the archive. If a duplicate is
found, only a reference to that block is added. A
rolling-checksum algorithm, modeled on the one used by rsync,
can even find duplicate blocks who's relative offset is
different in two files.
To get QReacall to capture at my goal of 2GB/min means it has to
efficiently perform about 200,000 lookups per second to
determine if each block or fragment is, or is not, already
contained in the database. Needless to say, the efficiency of my
lookup and caching algorithms is paramount!
The file I've been struggling with is the hash file. It's
essentially an array of 12 byte records. Each record contains
either nothing, or a 64-bit checksum and a 32-bit record number.
Looking for a matching block consists of calculating its
checksum, using the checksum as an index into hash file, then
reading the records at that position for a possible match.
The current version of the application uses a "pre-fetch" thread
to read-ahead in the hash file and cache records in memory. This
works great for smaller archives. Once the archive was 1TB or
so, however, the read-ahead thread starts to thrash/compete with
the regular lookup logic. This was driving my performance into
the toilet and limiting the useful size of archives.
I think Dominic's explanation of the OS's read-ahead behavior
might explains some of it. I was really sending some seriously
mixed-signals to the OS, having one thread read linearly while
other threads would jump in and read randomly.
Anyway, the "cache the neighboring records as you go" algorithm
is much better and exhibits near linear performance as the
archive size increases. I'm hopeful again of achieving my goal
of capturing at 2GB/min to archives up to 10TB in size.
Thanks again to everyone for the help,
James
--
James Bucanek
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden