Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Get the underlying cache block size of a filesystem?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Get the underlying cache block size of a filesystem?

Subject: Re: Get the underlying cache block size of a filesystem?
From: James Bucanek <email@hidden>
Date: Thu, 11 Feb 2010 12:55:59 -0700

James Peach <mailto:email@hidden> wrote (Thursday, February 11, 2010 11:13 AM -0800):

Probably your best bet is to use f_iosize (or f_bsize) from struct statfs, see statfs(2).

Dominic Giampaolo <mailto:email@hidden> wrote (Thursday, February 11, 2010 10:44 AM -0800):

Programmatically you can use statfs() to find out info
about the block size of the file system.  The f_bsize
field is the one to look at.


Dominic,
James,

<slapping forehead>Duh!</slapping forehead>

If it was a snake, it would have bitten me. :)

I did remember looking at f_bsize, but assumed that this was the size of sectors on the physical media and not necessarily the amount of data the OS would read each time. Regardless, these are exactly the numbers I'm looking for.

You can observe exactly what the OS is doing by running
"sudo fs_usage -w" and examining the output.  That will
show you exactly which i/o's get issued to the disk.

That was my fall-back/hacker solution if I couldn't get a definitive answer from the OS.

Unless someone explicitly formatted it otherwise, the default HFS+ block size is 4k. If your app is doing random i/o the cluster layer will detect this and it will not issue any larger i/o's (in comparison to when it detects you're doing sequential i/o which causes it to issue read-ahead). So if you examine the i/o's generated while running your app, I would expect to see a series of effectively random 4k i/o's spread all over the disk blocks contained by your db file. I believe this is mostly what you want. What you can do to be even more effective though is to use the F_NOCACHE fcntl() to tell the OS to not bother caching the data since you have your own cache. This will reduce the impact your app has on the rest of the system and in effect give you more memory for your own cache (which is going to be more effective).

I'm using the Carbon API to read the file; I am passing noCacheMask to FSReadFork().

An explanation of the caching behavior is very informative, and might explain some of the performance problems I was having, which I'll expand on after I describe the application...

BTW, your app sounds kind of interesting: what sort of
data are you manipulating that has 12 byte records and
hundreds of millions of items?  (if you can talk about
it of course - I certainly don't want to know any kind
of confidential info).


It's public knowledge: <http://www.qrecall.com/>

QRecall is a backup solution that takes a database-like approach to incremental backups. QRecall breaks every file up into individual data blocks and adds those to a massive database (archive). The key advantage is that identical blocks are never duplicated in the archive. Each block added is first compared to the millions of blocks already in the archive. If a duplicate is found, only a reference to that block is added. A rolling-checksum algorithm, modeled on the one used by rsync, can even find duplicate blocks who's relative offset is different in two files.

To get QReacall to capture at my goal of 2GB/min means it has to efficiently perform about 200,000 lookups per second to determine if each block or fragment is, or is not, already contained in the database. Needless to say, the efficiency of my lookup and caching algorithms is paramount!

The file I've been struggling with is the hash file. It's essentially an array of 12 byte records. Each record contains either nothing, or a 64-bit checksum and a 32-bit record number. Looking for a matching block consists of calculating its checksum, using the checksum as an index into hash file, then reading the records at that position for a possible match.

The current version of the application uses a "pre-fetch" thread to read-ahead in the hash file and cache records in memory. This works great for smaller archives. Once the archive was 1TB or so, however, the read-ahead thread starts to thrash/compete with the regular lookup logic. This was driving my performance into the toilet and limiting the useful size of archives.

I think Dominic's explanation of the OS's read-ahead behavior might explains some of it. I was really sending some seriously mixed-signals to the OS, having one thread read linearly while other threads would jump in and read randomly.

Anyway, the "cache the neighboring records as you go" algorithm is much better and exhibits near linear performance as the archive size increases. I'm hopeful again of achieving my goal of capturing at 2GB/min to archives up to 10TB in size.

Thanks again to everyone for the help,

James
--
James Bucanek

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

pthread_exit() may only be called against threads created...
From: Steve Gehrman <email@hidden>


Prev by Date:
Re: Get the underlying cache block size of a filesystem?

Next by Date:
How to support advisory posix locks and detect deadlocks?

Previous by thread:
Re: Get the underlying cache block size of a filesystem?

Next by thread:
pthread_exit() may only be called against threads created...

Index(es):

Date
Thread