Re: Shared mmap and data consistency upon a crash
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com Am 10.05.2006 um 21:28 schrieb Michael Smith: Date: Wed, 10 May 2006 11:02:13 +0200 From: Felix Schwarz <felix.schwarz@iospirit-gmbh.de> Subject: Shared mmap and data consistency upon a crash To: darwin-dev@lists.apple.com Message-ID: <4E6C2A59-FEEF-4E6C-88F6-27ECEAE2BD92@iospirit-gmbh.de> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Hello, I am currently exploring the possibilites to use mmap to speed up some of my most critical I/O, mmap() is a convenience interface, not a performance interface. Thanks for the quick response! Some benchmarking gives me: Adding 30000 sample records à 8100 byte using mmap(): 0.72958 seconds Adding 30000 sample records à 8100 byte using read()/write()/lseek(): 1.24004 seconds but am wondering about the reliability of mmap and the durability of changes made to a file this way for one particular edge case: If 1) I mmap() a file using MAP_SHARED, PROT_READ and PROT_WRITE 2) I write into the returned address space 3) my app crashes prior to reaching munmap() and close() To me, it feels like you are trying to solve the wrong problems. Hope all that explains it a bit better. Felix _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... On May 10, 2006, at 12:03 PM, darwin-dev-request@lists.apple.com wrote: Stop right here. What makes you think that mmap will "speed up" your I/O? If you want to avoid blocking during I/O, consider the Posix asynchronous I/O system calls. If you want to avoid the copy in/out from kernel space, consider using the F_NOCACHE fcntl(2) option. If you control your caching behaviour and don't expect another process will access your file data, this is typically the most "performant" way to go. I maybe should have added what it is that I'm writing: I'm writing a specialized, server-less database link library (similiar to SQLite by concept, my link library has only *very* basic functionality, though). The idea behind using mmap() is to pull indexed records into and out of a file as fast as possible and take advantage of any caching by the VMM, especially regarding the table inside the file that keeps record of all the records it stores. Since I've read elsewhere you can't mmap() files that are located on a network-volume, I've also written a fallback using read()/write()/ lseek().. It appears, that, in both cases, the data in the file seems to be in sync when crashing. I could observe both a delay before any changes could be made to the file after an instant restart of the little test app plus the data that was written last could be reread correctly. With either method. However, I'd still love an official confirmation from a kernel developer (or someone who is literate in that part of the kernel) since - I don't want to base data security on assumptions or lucky observations that may turn out wrong or may change with the next release of OS X - I'd love to use the mmap() solution - .. and then, I'm also plain curious ;-) If your app crashes, all bets are off - assuming you don't know where it crashed, you have no guarantee as to the state of anything it was doing, regardless of whether it had files mapped or not. The way the data is organized in the file and the order in which its data is modified ensures its always in a consistent state - the actual modification to the table that keeps record of all records is written last and with one single memcpy() (or write() for the alternative version). Typically, speaking as a developer, I find it's better to ship products that don't crash, rather than ones that are resistant to internal damage caused by crashes. = Mike In general, I do share your view. The focus in software development should always be on producing clean code in the first place, not trying to fix bad quality afterwards in other, new code. Rest assured, I spend considerable time in my projects on ensuring elegance and cleanness in the code and stability in the build. However, a database link library is a special case (again, sorry for not previously specifying the nature of the project more clearly). You can make sure your code works rock-solid. But you don't know who will end up linking it into which kind of application. And you don't know which other APIs and link libraries play a role in that application. Take OS X graphics APIs for example. There are a lot of ways that a user can crash your application if it - for example - uses QuickTime or Quartz ImageIO to load images and those choke on a file. You can neither prevent nor fix this. But your user's data should stay in a consistent state nonetheless. This email sent to site_archiver@lists.apple.com
participants (1)
-
Felix Schwarz