Re: Read/Write call interaction w/ UBC
Re: Read/Write call interaction w/ UBC
- Subject: Re: Read/Write call interaction w/ UBC
- From: Terry Lambert <email@hidden>
- Date: Fri, 11 Jul 2008 17:08:17 -0700
On Jul 11, 2008, at 3:56 PM, shailesh jain wrote:
1) If the user mmap's a file and makes some changes to it and then
calls
read() system call (without msync'ing), is it the responsibility of
a filesystem
to flush the changes that could have been made by mmap system call
before
actually reading the file ?
Hard reality time:
There is no such thing as an uncached mmap(). This is because you
will write to the mapped version of the page in the buffer cache, and
it will then be marked dirty, so that multiple writes in the same page
can be "gathered" and written simultaneously.
The only way to implement that completely uncached would be to map the
page read-only and set a page attribute flag to indicate that "this is
really an uncached write page". The net result of this would be a
page fault each time you wrote a byte in the page, and then in the
page-fault fixup handler in the VM, you'd see the flag and "fix the
fault" with a write of the entire page to the underlying device. This
is because processors only provide page protections on a page boundary
and not on a byte boundary, for (I hope) obvious reasons.
This would be incredibly, incredibly expensive, and, if this were, for
example, a flash device, you would exceed the spec.'ed number of
available write cycles very quickly, after which no more writes would
work because the device was "used up". Even something like a disk
would end up "used up" pretty quickly by this kind of behaviour, which
is one reason there are disk caches.
Therefore, no vendor implements memory change notification like this.
Typically, if someone does something insane like trying to implement a
totally uncached device, then rather than permitting mmap at all, they
just disable it and say "sorry, no mmap for this device", or they
modify their hardware design to include something like static column
DRAM, so that writes to the device are cached by the device (and they
lose their non-cached implementation that way, instead). E.g. treat
as incoherent VM and buffer cache.
IF you are asking about cache coherency in general, instead, and have
given up on trying to cut the ubc out of the picture, THEN the answer
is that ubc is an implementation of a coherent VM and buffer cache.
The msync() system call was invented to support the concept of
explicit application-based coherency, when the application chooses to
mix read/write and mmap based access. The msync is intended as a
barrier following memory writes, which causes a sync from the VM to
the buffer cache in systems where the VM and buffer cache are
decoherent (systems such as SVR3, BSD 4.3, Solaris pre-Solaris 5 and
post Solaris 9, etc.).
You will generally only see it used in applications that mix I/O
styles, usually because they were written and modified over long
periods periods of time by different people, and grew organically
instead of being designed. An example is the classic "netnews", which
actually gets its barriers and msync's right.
2) Also, while referring to source for smbfs, I came across a
comment that said " we shall maintain synchronization between mmap
and read/write by using UPL". Isn't just flushing mmaped pages
enough ?
Not unless there are serialization barriers between the calls to write
and modifications via mmap. An msync will only sync out the range
specified, and that will turn into writes if the dirty bits are set on
the page(s) in that range. There is no guarantee that there will not
be a memory modification in a page, followed by a write, followed by
another memory modification, followed by an msync, other than the
application author knowing what they are doing and instituting order
of operation guarantees in their code (this is how "netnews" did it).
Failure to order your memory and write operations will result in a
stale version of data in the cached page that is later msync'ed, which
means that any writes you did between memory modification and msync
end up being lost, as the VM page is copied over the buffer cache page
(in the case of a decoherent VM and buffer cache), or the cached page
contents overwriting the uncached write (in the unified VM and buffer
cache case, for systems which support application-forced uncached
writes, like MacOS X).
The moral of this story is to not mix memory and file based I/O in
decoherent systems, and to not mix cached and uncached I/O even in
coherent systems, unless you know what you are doing, and barrier
appropriately.
Since you can't actually control what an application author does on
your system, typically the best strategy is to by default unify your
VM and buffer cache and always do cached I/O (implicit coherency).
Then give the people who know what they are doing an "escape hatch"
from the default behaviour.
-- Terry
/Shail
On Tue, Jul 8, 2008 at 10:46 PM, Michael Smith <email@hidden> wrote:
On Jul 8, 2008, at 2:25 AM, Terry Lambert wrote:
On Jul 7, 2008, at 7:21 PM, shailesh jain
<email@hidden> wrote:
I am writing a file system that does not support caching. I wanted
to know, If by just passing
NO_CACHE and CANT_CACHE option, I can disable entire caching ?
As of my understanding, the options are just to disable name lookup
etc caching and it cannot
disable file content caching i.e where UBC comes into play. Can I
disable file content caching i.e UBC ? (set it to UBC_INFO_NULL ?).
Also, apple docs says that UBC for apple is different from FreeBSD.
I wanted to know in what respect ?
Read the sources. I put massive block comments in the UBC code for
just such an emergency. 8-).
Also, look at the sources for older versions of smbfs, which had
similar issues.
= Mike
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden