Wade Tregaskis <wjtregaskis@students.latrobe.edu.au> wrote: Wade -- one thing you might want to look in to is that some operating systems implement an asynchronous I/O API in addition to traditional blocking synchronous I/O. For asynch, the difference in the sequence of events (from userland POV) is that the initial I/O call doesn't block, and there is always some mechanism to notify the process of I/O completion. In "classic" MacOS, ISTR this was done by giving the OS a pointer to a callback function, which would be called on error or completion. I think it also set a flag in the descriptor structure used to request the I/O, so you could also determine completion by polling that flag. Is this still an efficient way of operating, in an increasingly linear [e. g. single-user] OS? I'm just thinking, if you could return from writes faster, the calling process can keep doing other things. The classic way of doing this in UNIX systems is to make the application multithreaded. By splitting things into an I/O thread and a (other stuff) thread, you can avoid making the whole program stop every time it needs to do I/O. Of course, this often results in the program internally implementing mechanisms quite similar to async I/O completion notification. I'm not the one to ask whether doing I/O threads manually in user space or automatically in kernel space (which is basically what you end up with when you do async I/O APIs) is better. :) My thought is that you could do this without breaking existing apps by performing any error checking [i.e. quickly verifying there is free space, among other things] immediately, returning fail if appropriate, or otherwise returning no error. You could then worry about physically getting the data to disk, seeing as this is the most time intensive part... With the existing semantics for write(), it's not safe to do that precisely the way you describe. Programs can assume that it is safe to do anything with the buffers passed to write() after write() has returned. If there is a kernel thread still reading from them after write() returns, you could run into serious problems when the userland program overwrites or deallocates buffers before they have really been consumed. (Async I/O APIs require the user to keep the buffers valid until notification of I/O completion, of course.) Now, that's not to say that you can't get effectively what you have described. The key is that there must be enough free memory to cache the data written by the application. Most modern UNIXen (MacOS X included) can dynamically expand and shrink the amount of disk buffers to make the most use out of "free" memory that really isn't being used by anything. So, if you write 5 megabytes of data, and there's 10 megabytes of memory free, the system is probably going to just grab 5 MB of RAM for buffers, write the data to it, and return immediately. The semantics of write() are preserved, performance is better, everybody's happy. Of course, if there aren't enough free buffers for the I/O to complete immediately, and there isn't enough free memory to make new buffers, the system will block the calling process. Furthermore, some systems may (as a matter of policy) not allow write() to return until the data has really been written out to a disk -- or at least may not allow truly huge amounts of write buffering. I know that Linux is a system that is very aggressive about allowing you to fill free RAM with dirty disk buffers. Until you get down to about 5MB free or so, it will not be blocking much (if at all) on disk writes. I'm not quite sure what the rules are with OS X, though I am fairly sure it is nowhere near as aggressive as Linux. On Linux, you can do commands like: dd if=/dev/zero of=testfile bs=1048576 count=200 which writes 200 megabytes of zeroes to "testfile", and if you have enough memory free, it will finish VERY fast, before any data has really been written to disk. Then, a few seconds later, the disk light will come on solid as the kernel begins flushing the huge amount of dirty disk buffers it has accumulated. If you do this on OS X, the disk light will come on solid right away, and the command will not complete until the I/O has really finished. (Hard to tell if it's requiring every last byte to be written before returning; with modern disk transfer rates it could be allowing several megabytes of unwritten data and I wouldn't be able to tell by eye whether the disk light was staying on after dd finished.) There are positives and negatives to the extremely aggressive strategy of Linux, BTW. The positive is that if you don't run out of memory, everything is really really fast. The negative is that if you do, it can actually hurt performance. Plus, if the system should crash or lose power or whatever, there can be a whole lot of data in memory that has not been written out to disk, which can cause serious corruption on non-journaled filesystems. But, this from a newbie. I intend on acquiring some classic texts on this subject [general OS level operations] - can anyone name one or two that stand out? Tanenbaum & Woodhull's textbook "Operating Systems: Design and Implementation" is a classic introduction to the basic principles of operating system internals. It would definitely qualify as a "classic text", though: at least in the version I cut my teeth on, it covered the design of a UNIX-alike (Minix) which could run on 8088 based XT clones with floppy drives. Hopefully there's a newer edition which talks about somewhat more modern hardware. -- Tim Seufert _______________________________________________ darwin-kernel mailing list | darwin-kernel@lists.apple.com Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/darwin-kernel Do not post admin requests to the list. They will be ignored.