Re: [Fwd: Re: execv bug???]

11 Mar 2008

      site_archiver@lists.apple.com
Delivered-To: darwin-dev@lists.apple.com

-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwin-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl...

On Mar 9, 2008, at 10:31 AM, Jonas Maebe wrote:

On 28 Jan 2008, at 04:23, Jordan K. Hubbard wrote:

On Jan 27, 2008, at 2:46 PM, Jonas Maebe wrote:

That's more or less true for Linux, but not for Mac OS X at least
up till 10.4.x (I haven't benchmarked on 10.5 yet). Compiling our
compiler with itself, which involves about 173 (v)fork+execs from
a single compiler run to assemble&link all the files, is 20% to
25% slower with fork instead of vfork on a G4, and 35% to 40% on a
G5 (32 bit processes in both cases) on 10.4.x. And for clarity:
this is relative to the entire time needed for compiling+assembling
+linking everything (on the G5: 24 vs 15 seconds), not some
academic mbench-like speed difference between the fork and vforks.

It would be interesting to benchmark this in 10.5 as well, given a
number of changes to the relevant code

I finally got around to benchmarking this again. All tests below are
under 10.5.2, compiling our compiler with itself. In all cases these
are "native" compilations (i.e., an i386 compiler compiling an i386
compiler, an x86_64 compiler compiling an x86_64 compiler etc), and
the assembler gets its input via a pipe.
When the compiler is told not to assembler/link, it generates a
shell script with a the necessary calls to the assembler and linker
to assemble/link everything. The time needed to complete this script
is what is timed in the second item for each case below.

The fork(2) is expected to be rather slow, going up with the
complexity of the address map for the application.
Typical MacOS X has a lot more address space mappings allocated per
process, both in frame buffer/video data and in system libraries
within the shared segment, and these end up getting duplicated and
filled out if you fork(2), then thrown away when you execve(2).
So the cost you are measuring is the cost to duplicate the address
space mappings of the child in the parent process, then go throw away
the address space mappings in the child when you replace the currently
executing image with a new image via the execve(2).  This is basically
a conscious trade-off to make runtime less expensive at a penalty to
fork(2) duplication of address space mappings, but it means that the
degenerate case of fork(2) immediately followed by execve(2) ends up
slower than on other systems.
The vfork(2) call was explicitly added to COW implementations of
process address space overcommit for exactly this reason, way back
when (3.0 BSD): address space setup and teardown is expensive, and if
you do it for no reason, it's going to show on your benchmarks, if
that is what they measure (as opposed to what you may think they
measure).
Typical MacOS X applications will spend most of their time in user
space in CPU intensive code, or they will spend most of their time in
the kernel, blocked on an I/O channel waiting for slow disks or other
hardware to answer their request.  Either way, at that point this
overhead is very much lost in the noise, and so is not worth
optimizing compared to other, lower-hanging fruit.
Unless you are modifying process state after the vfork(2) before the
execve(2), then vfork(2) is likely your best bet for a quick fix.
If you are resetting privileges, opening/closing files, etc., then
posix_spawn(2) is likely your best bet.  Technically, the system is
permitted to give "Undefined behaviour" if you call any system calls
other than _exit(2) or execve(2) subsequent to calling vfork(2), so
don't do that, if you want your code to keep working in future
releases, use posix_spawn(2) instead.
This email sent to site_archiver@lists.apple.com

Re: [Fwd: Re: execv bug???]

Terry Lambert