Re: Read/Write call interaction w/ UBC
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com Hard reality time: Therefore, no vendor implements memory change notification like this. -- Terry /Shail On Tue, Jul 8, 2008 at 10:46 PM, Michael Smith <drivers@mu.org> wrote: = Mike _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... On Jul 11, 2008, at 3:56 PM, shailesh jain wrote: 1) If the user mmap's a file and makes some changes to it and then calls read() system call (without msync'ing), is it the responsibility of a filesystem to flush the changes that could have been made by mmap system call before actually reading the file ? There is no such thing as an uncached mmap(). This is because you will write to the mapped version of the page in the buffer cache, and it will then be marked dirty, so that multiple writes in the same page can be "gathered" and written simultaneously. The only way to implement that completely uncached would be to map the page read-only and set a page attribute flag to indicate that "this is really an uncached write page". The net result of this would be a page fault each time you wrote a byte in the page, and then in the page-fault fixup handler in the VM, you'd see the flag and "fix the fault" with a write of the entire page to the underlying device. This is because processors only provide page protections on a page boundary and not on a byte boundary, for (I hope) obvious reasons. This would be incredibly, incredibly expensive, and, if this were, for example, a flash device, you would exceed the spec.'ed number of available write cycles very quickly, after which no more writes would work because the device was "used up". Even something like a disk would end up "used up" pretty quickly by this kind of behaviour, which is one reason there are disk caches. Typically, if someone does something insane like trying to implement a totally uncached device, then rather than permitting mmap at all, they just disable it and say "sorry, no mmap for this device", or they modify their hardware design to include something like static column DRAM, so that writes to the device are cached by the device (and they lose their non-cached implementation that way, instead). E.g. treat as incoherent VM and buffer cache. IF you are asking about cache coherency in general, instead, and have given up on trying to cut the ubc out of the picture, THEN the answer is that ubc is an implementation of a coherent VM and buffer cache. The msync() system call was invented to support the concept of explicit application-based coherency, when the application chooses to mix read/write and mmap based access. The msync is intended as a barrier following memory writes, which causes a sync from the VM to the buffer cache in systems where the VM and buffer cache are decoherent (systems such as SVR3, BSD 4.3, Solaris pre-Solaris 5 and post Solaris 9, etc.). You will generally only see it used in applications that mix I/O styles, usually because they were written and modified over long periods periods of time by different people, and grew organically instead of being designed. An example is the classic "netnews", which actually gets its barriers and msync's right. 2) Also, while referring to source for smbfs, I came across a comment that said " we shall maintain synchronization between mmap and read/write by using UPL". Isn't just flushing mmaped pages enough ? Not unless there are serialization barriers between the calls to write and modifications via mmap. An msync will only sync out the range specified, and that will turn into writes if the dirty bits are set on the page(s) in that range. There is no guarantee that there will not be a memory modification in a page, followed by a write, followed by another memory modification, followed by an msync, other than the application author knowing what they are doing and instituting order of operation guarantees in their code (this is how "netnews" did it). Failure to order your memory and write operations will result in a stale version of data in the cached page that is later msync'ed, which means that any writes you did between memory modification and msync end up being lost, as the VM page is copied over the buffer cache page (in the case of a decoherent VM and buffer cache), or the cached page contents overwriting the uncached write (in the unified VM and buffer cache case, for systems which support application-forced uncached writes, like MacOS X). The moral of this story is to not mix memory and file based I/O in decoherent systems, and to not mix cached and uncached I/O even in coherent systems, unless you know what you are doing, and barrier appropriately. Since you can't actually control what an application author does on your system, typically the best strategy is to by default unify your VM and buffer cache and always do cached I/O (implicit coherency). Then give the people who know what they are doing an "escape hatch" from the default behaviour. On Jul 8, 2008, at 2:25 AM, Terry Lambert wrote: On Jul 7, 2008, at 7:21 PM, shailesh jain <coolworldofshail@gmail.com> wrote: I am writing a file system that does not support caching. I wanted to know, If by just passing NO_CACHE and CANT_CACHE option, I can disable entire caching ? As of my understanding, the options are just to disable name lookup etc caching and it cannot disable file content caching i.e where UBC comes into play. Can I disable file content caching i.e UBC ? (set it to UBC_INFO_NULL ?). Also, apple docs says that UBC for apple is different from FreeBSD. I wanted to know in what respect ? Read the sources. I put massive block comments in the UBC code for just such an emergency. 8-). Also, look at the sources for older versions of smbfs, which had similar issues. This email sent to site_archiver@lists.apple.com
participants (1)
-
Terry Lambert