Re: Shared mmap and data consistency upon a crash

12 May 2006

      site_archiver@lists.apple.com
Delivered-To: darwin-dev@lists.apple.com

On May 11, 2006, at 12:02 PM, darwin-dev-request@lists.apple.com wrote:
Send Darwin-dev mailing list submissions to
Date: Wed, 10 May 2006 23:38:20 +0200
From: Felix Schwarz <felix.schwarz@iospirit-gmbh.de>
Subject: Re: Shared mmap and data consistency upon a crash
To: darwin-dev@lists.apple.com
Cc: Michael Smith <msmith@freebsd.org>
Am 10.05.2006 um 21:28 schrieb Michael Smith:
I maybe should have added what it is that I'm writing: I'm writing a
specialized, server-less database link library (similiar to SQLite by
concept, my link library has only *very* basic functionality, though).
The idea behind using mmap() is to pull indexed records into and out
of a file as fast as possible and take advantage of any caching by
the VMM, especially regarding the table inside the file that keeps
record of all the records it stores.

Using the VM for caching in this case does have some advantages
if you're trying to keep your code simple.  Bear in mind though that
you'll typically get LRU behaviour on cached data, so your indexes
will only stay resident to the extent that you keep them hot.
Since I've read elsewhere you can't mmap() files that are located on
a network-volume, I've also written a fallback using read()/write()/
lseek()..

You can, but there is no graceful recovery.  You are better off using
pread/pwrite, since that will allow you to multi-thread portions of
your I/O code as well as saving you a system call per op.
Mmap will also give you issues once your database starts to get
large, while file I/O will tend to scale better.
It appears, that, in both cases, the data in the file seems to be in
sync when crashing. I could observe both a delay before
any changes could be made to the file after an instant restart of the
little test app plus the data that was written last could be
reread correctly. With either method.

Assuming the crash happens when the data in the object (either via
memory write or write system call) is consistent, you should be OK.
Note that if the app crashes in e.g. another *thread* while you are
in the middle of an update to your datastructures, all bets are off.
You can be pre-empted or terminated at any time, and the object
will be flushed to disk as-is.
This is one argument for using a journal and system calls; as a
general rule, writes to a disk file tend to be atomic - either the
write completes, or it doesn't.
If your app crashes, all bets are off - assuming you don't know
where it crashed, you have no guarantee as to the state of anything
it was doing, regardless of whether it had files mapped or not.

The way the data is organized in the file and the order in which its
data is modified ensures its always in a consistent state - the

Note that you don't get any guarantees about what order things
are written to disk unless you explicitly sync the file.
This is mostly an issue when considering system crashes, though
the scenario above w.r.t. crashes on another thread needs to be
borne in mind as well.
HTH, and good luck!
= Mike
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwin-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl...
However, I'd still love an official confirmation from a kernel

developer (or someone who is literate in that part of the kernel)
since

- I don't want to base data security on assumptions or lucky

observations that may turn out wrong or may change with the next

release of OS X

- I'd love to use the mmap() solution

- .. and then, I'm also plain curious ;-)

I'm not going to give you an "official" anything, but I think I've
outlined

the tradeoffs fairly.  8)
This email sent to site_archiver@lists.apple.com

Michael Smith

tags

participants (1)