Re: Getting a read call before open
Re: Getting a read call before open
- Subject: Re: Getting a read call before open
- From: "shailesh jain" <email@hidden>
- Date: Thu, 26 Jun 2008 20:13:43 -0700
"You sound instead like you're getting your first fault, and there is a problem with your page-in function declaration."
I do not follow when you say 'problem with page-in function declaration'? My page-in function gets called as expected, however *only* (?) the parameters passed to it are unusual. Why would kernel pass {zero offset in virtual memory space, zero offset, zero size, zero flag} to page-in function under any condition ?
/Shail
On Thu, Jun 26, 2008 at 3:33 PM, Terry Lambert <
email@hidden> wrote:
The first page is loaded by execve only for the purposes of reading the header to determine the magic number to see if it's an interpreter, a Mach-o file, a universal binary, or "other" (non-executable).
Once this happens, if it's Universal, the binary is "graded" and a slice (Mach-o file encapsulated in the Universal binary container) has its first 4K read; otherwise, if the "magic number" is "#!", then the path after the ! is read and reinterpreted as a request to load that instead, with the script as argv[0]., and it gors back to load the first page of the interpreter. If the magic number indicates it's a PPC binary, and you are on an Intel machine, then it implies an interpreter of Rosetta and messes with the recorded p_comm field of the process so it doesn't look like an interpreter is running.
Either way, you might get two or more 4K page reads before it settles down into mach_loader. That code takes the 4K already read, and starts going through the "load commands" list, mapping things into the new process' address space. If one of then is a dynamic linker segment, it loads the one matching the architecture information from the final binary into the address space as well, and then the thread state information is set from the thread state structure (one of the things loaded).
At that point, the exec returns to user space, which causes the program counter and other registers to be loaded in the thread state, and execution starts there (usually in dyld).
None of the other pages end up coming in, until you start using them, and take a fault, which causes your FS to get contacted again to supply the requested information.
You probably don't see a lot of this unless you are the boot device, since you are not where most of this stuff comees from, and so you are "out of the loop" about the other activity that's happening.
You sound instead like you're getting your first fault, and there is a problem with your page-in function declaration.
-- Terry
On Jun 26, 2008, at 12:42 PM, shailesh jain wrote:
Hi,
The follow up question, I guess.
Now I have just implemented a prototype VOP_PAGEIN. But the parameters passed to this function {size, offset, vm_offset} are all set to
zero. I am clueless, as why is it set to zero ?
Ideally shouldn't offset be 4096?, because first page has already been loaded by execve.
Note: My filesystem does not support caching. Also, let me know if this question is more appropriate on filesystem-dev mailing list.
/Shail
On Wed, Jun 25, 2008 at 11:16 PM, shailesh jain <email@hidden> wrote:
Hi,
Thanks. I actually figured out that I had not yet implemented VOP_PAGEIN.
Thus, execve used to only load 1st page (4096 bytes) and then later depended on page fault
to load remaining bytes. But since I didn't implement VOP_PAGEIN, the application just used to hang.
Thanks for the information.
/Shail
On Wed, Jun 25, 2008 at 11:03 PM, Terry Lambert <email@hidden> wrote:
On Jun 25, 2008, at 6:54 PM, shailesh jain wrote:
When I try to run executable over my filesystem, it just hangs (i.e shell prompt never returns) when I tried to do implicit open in the read call to my filesystem.
Digging through the source code, I found that execve calls vn_rdwr() and subsequently, VOP_READ() call. This read is invoked to load PAGESIZE bytes (4096) which my filesystem delivers it properly. However, I do not get read call to load remaining bytes. I can't seem to decipher that.
/Shail
On Wed, Jun 25, 2008 at 4:12 PM, shailesh jain <email@hidden> wrote:
Is it legitimate for a filesystem to get a read call before open call ? Also, how should a filesystem
handle such behavior (implicit opens and close ?)
Hi; sounds like you are writing a remote filesystem.
If you vend a vnode, be prepared to get any number of calls upon it. Once it has been vended, yo have agreed to puture calls on it until such time as it has been released back t you for recycling.
Certain filesystems dislike this (SMB, as an example, disallows renames for open files, and we don't support ETXTBUSY unless the FS client maintains "this has been exec'ed state" and returns it itself). I understand that this can bother people, but until the vnode has been invalidated, either by being given back to you as no longer being needed by who you gave it to, or being deadfs'ed (e.g. by a forced unmount from your side of things), it is the property of whoever you vended it to.
Another situation where it's possible for this to happen is open/mmap/close, where the memory mapping is kept active by a vnode reference by the paging system (maybe you just did not notice the open/close, and now you incorrectly think it's closed). Again, as far as the kernel is concerned, the vnode pointer *is* the file.
For the specific case of exec, yes, there is a vn_rdwr() after you vended the open vnode to the exec code via a lookup of a name. You will potentially get VOP_READ() calls if it's a FAT file or you have code signing, and you will definitely get vm mappings established for the address space of the new process, so what you are seeing is both reasonable and expected.
If you choose to rip the vnode out from under us, of course whatever program you are running will crash the first time it page faults in a clean page from the backing store you promised it when you gave out the vp.
Hope this clears things up for you.
- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden