On Jul 6, 2005, at 12:26 AM, Thomas Tempelmann wrote:
You can't "write a kext that hooks into the file system stack". The
VFS KPI
is a one-to-one contract between the kernel and a filesystem; it does
not support interposition by a third party.
I've heard several times the term "stackable file system". I assumed this
meant that file system calls get passed in a way that provides for
filtering of them before they reach the final FS handler?
You would have heard this in the same sort of context as "cure for cancer"
and "holy grail". As Quinn points out, MacOS X does not support
stacking filesystems.
What do you mean by "got written"? Created? Modfied?
Do you want one log entry for each operation, or just a summary?
One entry for each operation so that I know whenever a file gets modified.
How much work do you plan to do on each notification? Do you need to
know about each and every operation as it happens, or is it enough for you
know that the file has been modified at some point prior to "now"?
If the delayed notification is acceptable, how long a delay can you tolerate?
Be aware that doing work on every operation, poses a real performance
risk, as well as (again) the possibility of endless recursion.
What do you plan to do about files that move after they are opened?
Deleted after they are opened?
I like to be able to record these events in my log as well.
By which name would you like them called?
Opened twice via different paths? Not opened by path at all? Opened
on remote filesystems without persistent IDs? Opened via context-
sensitive
I only care for user-accessible files on local disks, no need for
monitoring remote FSs or device drivers.
What about your other users? Is it important that it only be files that a
specific user can see? This will impose additional overhead, unless you
only want changes being *made* by that user; though note in that case
that you run the risk of missing changes being made by proxy (other tasks
doing work for the user).
if you plan to write something that
works this closely with the system, you must understand how files
actually work.
You mean, how they work in Unix.
No, how they work in MacOS X, which is new animal.
Are you implying that there's no way to identify a directory entry by the
handle that gets passed to a FS write call?
Directory entries are private to the filesystem implementation, and
there's no requirement that looking up a file actually involve a directory
entry at all; lookup passes a parent object and a name fragment to a
filesystem and gets back a child object or an error.
If you mean the parent directory vnode, then yes, this can typically be
obtained, however again as Quinn notes what you will get is one of the
directories in which the file was looked up; not necessarily one where
it currently resides, or the one associated with the operation that is
actually in progress.
Lookups performed by fileid may not have a meaningful parent identifier
at all; again this is a function of the filesystem implementation. For HFS
at least you can assume that they will courtesy of the permissions checking
model.
Probably because you can have multiple hard links to a file and thus
there's no way telling which open() call (for which dir entry) did create
the handle?
The file handle is not relevant at the layer you're interested in; it's a
per-process association between a descriptor number and a vnode.
If multiple processes have the same file open, they each have a handle,
but there is only one vnode. All operations at the filesystem level are
performed against the vnode; this is how synchronisation and cache
coherency are managed.
In this context, if the file was looked up via different paths, you can only
recover one path (with vnode_getpath() as Quinn pointed out) for the
vnode, which will typically (but not always) be one of those by which it
was looked up.
The system is, as a general rule, not interested in where a file "was" at
a given point in time, nor where it "is" now, except when looking it up.
The concept of "location" for a file in general is a convenient fiction for
upper layers.
In that case, I'd have to monitor open calls myself and build my own
table to identify the handles of write calls, right?
data if the file is deeply buried).
general case you'll find that what you want to do may be easy enough.
and generic solution is very hard to achieve.