Re: Shared mmap and data consistency upon a crash
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com On May 11, 2006, at 12:02 PM, darwin-dev-request@lists.apple.com wrote: Send Darwin-dev mailing list submissions to Date: Wed, 10 May 2006 23:38:20 +0200 From: Felix Schwarz <felix.schwarz@iospirit-gmbh.de> Subject: Re: Shared mmap and data consistency upon a crash To: darwin-dev@lists.apple.com Cc: Michael Smith <msmith@freebsd.org> Am 10.05.2006 um 21:28 schrieb Michael Smith: I maybe should have added what it is that I'm writing: I'm writing a specialized, server-less database link library (similiar to SQLite by concept, my link library has only *very* basic functionality, though). The idea behind using mmap() is to pull indexed records into and out of a file as fast as possible and take advantage of any caching by the VMM, especially regarding the table inside the file that keeps record of all the records it stores. Using the VM for caching in this case does have some advantages if you're trying to keep your code simple. Bear in mind though that you'll typically get LRU behaviour on cached data, so your indexes will only stay resident to the extent that you keep them hot. Since I've read elsewhere you can't mmap() files that are located on a network-volume, I've also written a fallback using read()/write()/ lseek().. You can, but there is no graceful recovery. You are better off using pread/pwrite, since that will allow you to multi-thread portions of your I/O code as well as saving you a system call per op. Mmap will also give you issues once your database starts to get large, while file I/O will tend to scale better. It appears, that, in both cases, the data in the file seems to be in sync when crashing. I could observe both a delay before any changes could be made to the file after an instant restart of the little test app plus the data that was written last could be reread correctly. With either method. Assuming the crash happens when the data in the object (either via memory write or write system call) is consistent, you should be OK. Note that if the app crashes in e.g. another *thread* while you are in the middle of an update to your datastructures, all bets are off. You can be pre-empted or terminated at any time, and the object will be flushed to disk as-is. This is one argument for using a journal and system calls; as a general rule, writes to a disk file tend to be atomic - either the write completes, or it doesn't. If your app crashes, all bets are off - assuming you don't know where it crashed, you have no guarantee as to the state of anything it was doing, regardless of whether it had files mapped or not. The way the data is organized in the file and the order in which its data is modified ensures its always in a consistent state - the Note that you don't get any guarantees about what order things are written to disk unless you explicitly sync the file. This is mostly an issue when considering system crashes, though the scenario above w.r.t. crashes on another thread needs to be borne in mind as well. HTH, and good luck! = Mike _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... However, I'd still love an official confirmation from a kernel developer (or someone who is literate in that part of the kernel) since - I don't want to base data security on assumptions or lucky observations that may turn out wrong or may change with the next release of OS X - I'd love to use the mmap() solution - .. and then, I'm also plain curious ;-) I'm not going to give you an "official" anything, but I think I've outlined the tradeoffs fairly. 8) This email sent to site_archiver@lists.apple.com
participants (1)
-
Michael Smith