Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: thread_t, uthread_t, at al.?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: thread_t, uthread_t, at al.?

Subject: Re: thread_t, uthread_t, at al.?
From: Terry Lambert <email@hidden>
Date: Fri, 27 Oct 2006 16:10:41 -0700

On Oct 27, 2006, at 12:42 PM, Rick Mann wrote:

On Oct 27, 2006, at 12:21 , Michael Smith wrote:
There is a reason that lsof does this. The information you're looking for is not maintained by the system; it would be expensive to do so, and the work would almost always be wasted.
I'm not sure I agree with this assertion. I don't think it would be asking too much. The Mac OS (Carbon) file system was able to do it, and I hardly think that support could be blamed for any significant slowdown. (Mind you, this is speculation).

Carbon did this in pre-Mac OS X days by running in the same address space; if you don't do protection domain crossing for each piece of information, it's a heck of a lot easier to do what you are trying to do. If you have to cross protection domains (e.g. because you have an OS that prevents one application from brining unrelated applications down (best case) or the whole system down (worst case), then it's very expensive.

The what things are stores is that there is a list of processes.

For each of these processes, there is a per process open file table.

For each per process open file table entry, there's a pointer to a fileglob. Multiple open file instances in the same process and in other processes in the system (either child, parent, or processes which have used UNIX domain sockets to pass an open descriptor around) can point to the same fileglob.

For a fileglob, there's a pointer to an fd_data; what this represents depends on the type of entry that it represents (e.g. vnode, pipe, socket, etc.). Multiple file globs can point to the same thing (in this case, you are caring about vnodes).

The vnode points to the v_data, which represents the in core data necessary to access a vnode object - usually, a filesystem object ; there is *usually* a 1:1 correspondence between vnode and per-FS instance object; in the case of HFS, this would be a cnode structure, or for UFS, an inode structure.

So basically, you have two places in the chain you want to back-track where the OS doesn't store a list of "who all points to me?".

You're correct that this *could* be maintained by the OS, but you're *wrong* when you say "I hardly think that support could be blamed for any significant slowdown".

The problem is that in order to maintain these lists, you would need to allocate list element structures in both cases (this is doable - it's just a memory penalty); however, when you went to insert and remove elements from these lists, you'd have to enforce a serialization barrier on list insertion, deletion, lookup, uniquification, etc., etc.. Pre-protected-mode Mac OS X would handle this by duplication of data and/or hlock()/hunlock() - serializing either data validity or serializing access.

This isn't really expensive on a non-preemptive multitasking system where you don't have protection domain crossing or address space switching, and for which you only have a single CPU that you're stalling out until the operation completes. But when you start talking about modern machines and modern OS's that are inherently more safe vs. viruses, etc., then you are starting to add up to some real performance penalties.

So far though, I can't think of anything that involves tracking the process to which a thread belongs in what I've outlined above. You aren't making an assumption about v_owner, are you?
Looking at the sources, v_owner is either null or it's current_thread(), but I guess that can change as a file is accessed, so that won't be reliable at all.
Modulo bugs in the code, it is completely reliable. It's just not what you thought it was.
I mean, it's not reliable as an indicator of which process is keeping a volume busy, my end goal.

If this is your goal, you need to reconsider the information you think you need in order to accomplish it.

In this case, you have a volume that's being held open by (presumably) a small number of vnode references, and you want to know where they are coming from. You can do this the slow way, which is to walk everything, or you can do this the fast way.

The primary reason lspf is slow in this case is that it displays all the information, and in obtaining this information, it bushes all the data, as individual data items, across the user/kernel protection domain boundary. So you are basically spending all your time in TLB shootdown, flushing, address space crossing, and copying data that not interesting for the problem you are trying to solve.

So don't do that - pretty simple.

What you likely want to do is make yourself a KEXT that needs to be recompiled vs. each instance of the kernel, so that if anything changes, the promiscuous knowledge you are using to walk the data structures (i.e. "the data structure has such-and-such a layout", "the pointer to such and such a list starts here", "this list is a STAILQ vs. a TAILQ or an SLIST", etc., etc. -- all the implementation we won't promise won't change between software updates) is valid for the kernel you load the thing into.

Then you ask your KEXT to send you only the information you want: you iterate the vnode list off the mount point using the locking-of-the- day that happens to be implemented in a particular version of the OS, and get together a list of vp's keeping the volume open.

The next part, you are going to want to take in two passes; the first is mandatory, the second, optional:

First pass, you iterate the proclist, and iterate the open file table for each proc, and look in the fg_data pointer for the vp's you are interested in. You copy this out all at once.

Second pass (if the first pass turns nothing up, or you are being pedantic, even though you know you can't close the volume with the vnodes held by the first pass), you go through the vnode object mappings for all the processes, and look to see if one of the vm objects associated with the process is associated with the vnode pager, and if so, if the backing object for the pager happens to be one of the vnodes you are interested in.

The benefits to doing it this way should be obvious from the above, but I'll spell them out:

(1) The information from lsof is a snapshot; going at it this way, you can get real data; byt the time you display it, it'll also be a snapshot, but it will be a *consistent* snapshot

(2) The information doesn't have to have multiple boundary transitions to get one piece of data (I think lsof is up in the neighborhood of 7 boundary crossing per vnode backed file to display anything, not including the display code)

(3) You only push useful data, instead of pushing everything

(4) You are order N*M/2, rather than order N * M * K, where K is large, because you iterate only for the vnodes you care about, instead of all of them.

(5) You avoid all that copying of data, TLB mapping, shodown, buffer flushing, etc..

The down side is that you have to recompile your KEXT each time an sU happens, but if you are traking a bug in FS code you wrote yourself, this should be no big deal.

All I want to so is add some very useful functionality that existed in the pre-Mac OS X Mac: the ability to tell who's got a volume open when you try to unmount it via the Finder. I can't actually fix the Finder, but I can come close with a contextual menu plugin. The only mechanism to determine this is lsof, which is slow and unreliable (it does not detect files opened by the Kernel). I was hoping to find another way.

Actually, no. If you wanted to do this, you could simply write an fsevents listener, and remember who did what open, keeping a count on the open and closes, removing records in your list when the open cout for a given process goes to 0.

If you wanted to do this via a KEXT, and query it for the information, that's really easy, too, by using a KAUTH listener for open and close events.

I'd be willing to experiment with adding support to the Kernel & BSD to keeping track of these things as files are opened and closed (as I said above, I don't agree with your assertion that this needs to have any significant impact on performance).

If you want to do this for your own use, that's fine, but the performance penalties on a heavily loaded system could be immense (see above).


-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >thread_t, uthread_t, at al.? (From: Rick Mann <email@hidden>)
  >Re: thread_t, uthread_t, at al.? (From: Michael Smith <email@hidden>)
  >Re: thread_t, uthread_t, at al.? (From: plumber Idraulico <email@hidden>)
  >Re: Re: thread_t, uthread_t, at al.? (From: "Shawn Erickson" <email@hidden>)
  >Re: thread_t, uthread_t, at al.? (From: Rick Mann <email@hidden>)
  >Re: thread_t, uthread_t, at al.? (From: Michael Smith <email@hidden>)
  >Re: thread_t, uthread_t, at al.? (From: Rick Mann <email@hidden>)
  >Re: thread_t, uthread_t, at al.? (From: Michael Smith <email@hidden>)
  >Re: thread_t, uthread_t, at al.? (From: Rick Mann <email@hidden>)




Prev by Date:
Re: seeking overview of scheduling on dual-core (or more)

Next by Date:
Re: seeking overview of scheduling on dual-core (or more)

Previous by thread:
Re: thread_t, uthread_t, at al.?

Next by thread:
Re: thread_t, uthread_t, at al.?

Index(es):

Date
Thread