site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com I did a little more thinking about the implementation, and it would be very hard for the kernel to offer a service that fully facilitates what Yogesh is trying to do without special-casing a lot of things. Does OS X allow layered file systems? It seems like it could be done by an intermediate file system. Short answer: -------- Forget it, it's unworkable. Long answer: -------- They are allowed, but they are not supported. -- Terry _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... On Jun 15, 2007, at 12:12 PM, Allen Briggs wrote: On Fri, Jun 15, 2007 at 10:59:07AM -0700, Michael Smith wrote: In particular, there is no way to have a hook into a transaction such that there are no change until the commit, and you get commit notification (and can do a backup) before any actual change takes place. I think the only way he'll get what he wants, frankly, is to replace HFS completely with an HFS that has these hooks built into it, and is built on a transaction model. The alternative is to back up at every authorization transaction requesting write access in scope VNODE, and then deal with it in scope FILEOP when the modified bit is present (and discard the change if not). You are still potentially shot in the foot, though, for memory mapped files, since the vnode reference is noted at close time rather than 1->0 reference time, so a reference held by the VM system is not going o notify you of pages dirtied as a result of memory mapped operations, if the file is closed after the mapping is established. Again, the only way around this would be to hook the paging path in the FS itself, below the VNOP layer, to get page write notifications, and you're back to replacing HFS. I.e. you can write them, but we will not help you fix them, and if they break, it's on you, and there are known cases where they would break. The primary reason for this is that there is no upcall mechanism in the Heidemann stacking architecture for object aliases. Consider an FS, FS-1, stacked on another FS, FS-2. It does whatever additional things it does, but the key feature it has is that it vends a vnode that refers to the vnode in the underlying FS. Each of these vnodes has an associated UBC structure that refers to the VM object backing the files. Now say you expose both these objects in the name space, either intentionally (by not mounting the second FS over top of the first's mount point) or unintentionally (by having a file open on FS-2 before FS-1 is mounted over top of it). In this case, you have a reference to a vnode vended by the upper layer and a vnode vended by the lower layer, both with buffer cache backing the same range of the file in different pages. Now, if you write using the upper layer vnode, everything is fine, since the put-page operation will go to the upper layer, modify the backing object, and push the change into the lower layer by calling the underlying FSs putpage as well (which would copy the change to the lower layers backing object, as well). Fine and dandy: ineffecient as all get out, but workable. Say, though, that you write to the lower layer vnode; now you push the changes into the lower layer vnoe's backing object, but the upper layer backing object does not get modified; further, there is no back- link to the upper layer vnode from the lower, so the cached data in the upper level vnode backing object becomes stale. Matters are made worse, because the upper level backing object can have a cardinality of N - that is, you could stack N upper level FSs on top of one lower level FS, so even if you had a pointer, there's no way to represent the data relationship to upper level obejcts. FreeBSD, OpenBSD, NetBSD, and DragonFlyBSD's stacking FS framework, which is also out of the BSD-4.4-Lite2 distribution, and so contains the same code out of John Heidemann's Ficus project out of UCLA, all face this same issue. The only real resolution to this is to deunify the VM and buffer cache (which you could do, by forcing the use of direct I/O for all files opened through a stacked FS), or to constrain what you are allowed to stack to only things which do directory folding or other non-block- translation type transforms. In other words, you constrain things so that you directly access the backing object pages from the underlying FS raw, rather than through an upper level alias: this means no compressing FSs, no encrypting FSs, etc., etc.. It's possible to address the constrained set (non-transformational) stacking layers by permitting aliasing of UBC objects and forcing stacking to not expose elsewhere in the namespace (this is essentially how FiST handles stacking), but it's not really clear that this would be as big a win as people seem to think it would be. This email sent to site_archiver@lists.apple.com