Re: How do disk writes work these days?

25 Jul 2002

      Wade Tregaskis <wjtregaskis@students.latrobe.edu.au> wrote:

 Wade -- one thing you might want to look in to is that some operating

 systems implement an asynchronous I/O API in addition to traditional

 blocking synchronous I/O.  For asynch, the difference in the sequence of

 events (from userland POV) is that the initial I/O call doesn't block,

 and there is always some mechanism to notify the process of I/O

 completion.  In "classic" MacOS, ISTR this was done by giving the OS a

 pointer to a callback function, which would be called on error or

 completion.  I think it also set a flag in the descriptor structure used

 to request the I/O, so you could also determine completion by polling

 that flag.

Is this still an efficient way of operating, in an increasingly linear [e.

g. single-user] OS?  I'm just thinking, if you could return from writes

faster, the calling process can keep doing other things.

The classic way of doing this in UNIX systems is to make the
application multithreaded.  By splitting things into an I/O thread
and a (other stuff) thread, you can avoid making the whole program
stop every time it needs to do I/O.

Of course, this often results in the program internally implementing
mechanisms quite similar to async I/O completion notification.

I'm not the one to ask whether doing I/O threads manually in user
space or automatically in kernel space (which is basically what you
end up with when you do async I/O APIs) is better.  :)

My thought is

that you could do this without breaking existing apps by performing any

error checking [i.e. quickly verifying there is free space, among other

things] immediately, returning fail if appropriate, or otherwise returning

no error.  You could then worry about physically getting the data to disk,

  seeing as this is the most time intensive part...

With the existing semantics for write(), it's not safe to do that
precisely the way you describe.  Programs can assume that it is safe
to do anything with the buffers passed to write() after write() has
returned.  If there is a kernel thread still reading from them after
write() returns, you could run into serious problems when the
userland program overwrites or deallocates buffers before they have
really been consumed.

(Async I/O APIs require the user to keep the buffers valid until
notification of I/O completion, of course.)

Now, that's not to say that you can't get effectively what you have
described.  The key is that there must be enough free memory to cache
the data written by the application.  Most modern UNIXen (MacOS X
included) can dynamically expand and shrink the amount of disk
buffers to make the most use out of "free" memory that really isn't
being used by anything.  So, if you write 5 megabytes of data, and
there's 10 megabytes of memory free, the system is probably going to
just grab 5 MB of RAM for buffers, write the data to it, and return
immediately.  The semantics of write() are preserved, performance is
better, everybody's happy.

Of course, if there aren't enough free buffers for the I/O to
complete immediately, and there isn't enough free memory to make new
buffers, the system will block the calling process.

Furthermore, some systems may (as a matter of policy) not allow
write() to return until the data has really been written out to a
disk -- or at least may not allow truly huge amounts of write
buffering.  I know that Linux is a system that is very aggressive
about allowing you to fill free RAM with dirty disk buffers.  Until
you get down to about 5MB free or so, it will not be blocking much
(if at all) on disk writes.

I'm not quite sure what the rules are with OS X, though I am fairly
sure it is nowhere near as aggressive as Linux.  On Linux, you can do
commands like:

dd if=/dev/zero of=testfile bs=1048576 count=200

which writes 200 megabytes of zeroes to "testfile", and if you have
enough memory free, it will finish VERY fast, before any data has
really been written to disk.  Then, a few seconds later, the disk
light will come on solid as the kernel begins flushing the huge
amount of dirty disk buffers it has accumulated.  If you do this on
OS X, the disk light will come on solid right away, and the command
will not complete until the I/O has really finished.  (Hard to tell
if it's requiring every last byte to be written before returning;
with modern disk transfer rates it could be allowing several
megabytes of unwritten data and I wouldn't be able to tell by eye
whether the disk light was staying on after dd finished.)

There are positives and negatives to the extremely aggressive
strategy of Linux, BTW.  The positive is that if you don't run out of
memory, everything is really really fast.  The negative is that if
you do, it can actually hurt performance.  Plus, if the system should
crash or lose power or whatever, there can be a whole lot of data in
memory that has not been written out to disk, which can cause serious
corruption on non-journaled filesystems.

But, this from a newbie.  I intend on acquiring some classic texts on this

subject [general OS level operations] - can anyone name one or two that

stand out?

Tanenbaum & Woodhull's textbook "Operating Systems: Design and
Implementation" is a classic introduction to the basic principles of
operating system internals.  It would definitely qualify as a
"classic text", though: at least in the version I cut my teeth on, it
covered the design of a UNIX-alike (Minix) which could run on 8088
based XT clones with floppy drives.  Hopefully there's a newer
edition which talks about somewhat more modern hardware.

--

Tim Seufert

_______________________________________________

darwin-kernel mailing list | darwin-kernel@lists.apple.com

Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/darwin-kernel

Do not post admin requests to the list. They will be ignored.

Re: How do disk writes work these days?

Timothy A. Seufert