Re: Strange behavior iterating the directory tree of a volume
Re: Strange behavior iterating the directory tree of a volume
- Subject: Re: Strange behavior iterating the directory tree of a volume
- From: Quinn <email@hidden>
- Date: Wed, 5 Mar 2008 22:10:19 +0000
At 10:35 -0500 12/2/08, Trainor, Chris wrote:
I am working on some code to iterate over the directory tree of an
entire volume.
Chris opened up a DTS tech support incident for this issue
<sonr://Request/42522047>, and that gave me a chance to investigate
it in depth. The results were quite surprising, so I thought I'd
share them with the group.
o There are two fundamental ways to iterate a directory,
<x-man-page://2/getdirentries> and
<x-man-page://2/getdirentriesattr>. Most BSDish APIs are based on
the former (for example, <x-man-page://3/readdir>,
<x-man-page://3/fts>, <x-man-page://3/scandir>), while Mac-style APIs
(for example, FSGetCatalogInfo, FSCopyObjectSync, and the Finder) are
based on the latter (if it's available).
o The bulk of the heavy lifting for these routines is done by the VFS
plug-in. This discussion focuses on HFS Plus. It's likely that the
details will be very different for other volume formats.
o If you apply the ostrich algorithm to mutation-during-iteration
(that is, you ignore the problem entirely), getdirentries is more
resilient than getdirentriesattr because of a bug in
getdirentriesattr <rdar://problem/5762961>.
o OTOH, getdirentriesattr gives you a way to /check/ for
mutation-during-iteration by way of its newState parameter.
Unfortunately this mechanism has a number of issues:
- Prior to Mac OS X 10.5, the newState value returns by
getdirentriesattr was actually the modification date of the
directory. This can cause problems, as described below.
- In Mac OS X 10.5 and later, this value is an in-memory generation
counter. This avoids the problems with modification dates.
- However, Mac OS X 10.5.x still has a bug in getdirentriesattr
<rdar://problem/5781876> that can cause mutations to go undetected.
o FSGetCatalogInfoBulk uses the newState result from
getdirentriesattr to calculate its containerChanged result. Beyond
the problems, both historical and current, inherited from
getdirentriesattr, FSGetCatalogInfoBulk has other historical problems:
- Mac OS X 10.0 through 10.1.x does not even initialise
containerChanged; you should not look at this value on those systems.
- Mac OS X 10.2 through 10.4.x always sets containerChanged to false;
thus, the value is not helpful on those systems.
- Mac OS X 10.5 and later implement containerChanged properly (modulo
the problems with getdirentriesattr of course).
o One suggested workaround for this problem is to latch the
modification date of the directory before you start iterating it and
to check that date when you're done. This doesn't work reliably
because the modification date for a directory (on HFS Plus, per my
earlier point) has a resolution of one second, regardless of the
resolution used in the API to get it. Now consider the following:
1. You get the modification date for the directory.
2. You start iterating the directory.
3. Some other process changes the directory.
4. You complete iteration of the directory.
5. You get the modification date of the directory.
6. You compare the values from step 1 and 6.
If all of these events happen in the same second, there's no way you
can detect changes. Actually, it's worse than that. If events 1
through 3 happen in the same second, you'll miss the change
regardless of how long it takes to iterate the entire directory.
o A better workaround is to use a <x-man-page://2/kqueue>. You can
add the file descriptor you're using to iterate the directory to a
kqueue and monitor that kqueue for changes. If you complete
iteration without seeing any changes, you have an accurate snapshot
of the directory's contents.
The following snippet shows this in practice:
kq = kqueue();
assert(kq >= 0);
fd = open("/Test", O_RDONLY);
assert(fd >= 0);
EV_SET(&kev, fd, EVFILT_VNODE, EV_ADD, NOTE_WRITE, 0, 0);
err = kevent(kq, &kev, 1, NULL, 0, NULL);
assert(err >= 0);
[... iterate the directory in the usual way ...]
err = kevent(kq, NULL, 0, &kev, 1, &kZeroTimeout);
assert(err >= 0);
if (err == 1) {
[... directory was changed ...]
}
EV_SET(&kev, fd, EVFILT_VNODE, EV_DELETE, NOTE_WRITE, 0, 0);
err = kevent(kq, &kev, 1, NULL, 0, NULL);
assert(err >= 0);
err = close(fd);
assert(err == 0);
kqueue support was introduced in Mac OS X 10.3. IIRC it had some
reliability problems on 10.3, but it's pretty solid on 10.4 and later.
o Another option is to use the FSEvents framework to watch for
changes. This is a great choice if you're already watching for
changes globally, for example, if you're developing backup software.
Share and Enjoy
--
Quinn "The Eskimo!" <http://www.apple.com/developer/>
Apple Developer Relations, Developer Technical Support, Core OS/Hardware
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden