On May 25, 2005, at 12:59 PM, Hamish Allan wrote:
There is a cache of names by which a vnode has been looked up; just terminal name components. There is a child->parent relationship maintained between a vnode and *one* parent in which it has been looked up. This cache is built as a side-effect of the lookup process, and persists in conjunction with the vnode cache. It's not assembled en masse at any particular point in time.
Okay, I think I understand, but that still doesn't fully explain the following behaviour:
1) $ mdfind hamishtest3
2) $ cat > foo.txt
3) hamishtest3
4) ^D
5) $ mdfind hamishtest3
6) /Users/hamish/foo.txt
7) $ mkdir bar
8) $ ln foo.txt bar/bar.txt
9) $ mdfind hamishtest3
10) /Users/hamish/foo.txt
11) $ cat >> bar/bar.txt
12) extra
13) ^D
14) $ mdfind hamishtest3
15) /Users/hamish/bar/foo.txt
16) $ rm foo.txt
17) $ mdfind hamishtest3
18) $ mdimport bar/bar.txt
19) $ mdfind hamishtest3
20) /Users/hamish/bar/foo.txt
In line 15 above, the last time the vnode for foo.txt was looked up was via /Users/hamish/bar/, so that's where foo.txt is reported.
But there is no file /Users/hamish/bar/foo.txt, and nor has there ever been, so something is wrong with the caching. Whatever part of the system is caching foo.txt shouldn't be doing so once the write to bar/bar.txt has happened. And why, even after I've removed foo.txt and imported bar/bar.txt, hasn't it been updated yet in line 20?
foo.txt and bar.txt are the same file; it just has two names. Only one of those names can be cached; in this case, it's the one by which the file was first looked up.
The "path to vnode" is assembled by traversing these relationships; there are transformations in the filesystem under which they cannot be correctly maintained, and if a file is looked up by two different paths, only one can be returned.
I think I understand this now. So to find out a file's name from its vnode I ask the filesystem for a directory listing on the vnode in its parent cache (and to find out its pathname I do that recursively).
No, you don't do that. You call vn_getpath() and let the system work it out as best it can.
I think that if the inverted index allowed multiple paths to be stored for a vnode, the correct behaviour could be achieved. Of course, some of my assumptions may not be correct. Any thoughts?
How would you know which path to return?
It would be good if you could return all of them: anything which would want a vnode-to-path-lookup would surely benefit from having them all? Instead of caching a single parent node, you would maintain a linked list (or whatever) of all parent nodes.
Let's assume you have:
foo/bar.txt
baz/quxx.txt
Both bar.txt and quxx.txt are links to the same file. Now you have to remember that the node was looked up as quxx.txt under baz, but as bar.txt under foo.
Since all of that information can change as files are renamed, and since there's no guarantee that a file will even have a name at all, maintaining this information is a lot of work for very little return.
Instead, developers must get used to the idea that despite having a vnode, or a file handle, they may not be able to get a path at which it can be looked up.
I thought it might be possible to get similar functionality without writing a VFS by having a daemon monitor the creation of new directories (either in a particular place, or with Spotlight-searchable attributes) and fill them with hard links to Spotlight search results. To preserve the pathname of the files found, I would create the full directory structure within the given folder; I therefore wanted a way to monitor whether that structure had changed, in order to halt any live updating of results and turn it into a 'normal' folder
Attempting to run a user process in lockstep with the filesystem is doomed from the start; don't do this.
Isn't that what Spotlight is: a user process attempting to run in lockstep with the filesystem by means of the fsevents API?!
No, spotlight is decoupled.