Re: How limits are enforced (was: various maxproc limits)
Re: How limits are enforced (was: various maxproc limits)
- Subject: Re: How limits are enforced (was: various maxproc limits)
- From: Terry Lambert <email@hidden>
- Date: Fri, 30 Sep 2005 00:38:28 -0700
On Sep 29, 2005, at 10:05 PM, Yefim Somin wrote:
Terry,
Thanks a lot for your answers. They clarified the hierarchy of
settings.
However, I need to provide an answer to your stock answers, namely,
"why the
hell do I need this," before figuring out whether my problem can be
resolved.
I am running a benchmark to compare performance of a number of
platforms.
This is an old application which relies on users connecting through
telnet
and running a series of application-specific requests, hence, even
before
the actual application is accessed there is a process on my Mac
system for
each connection. A few more processes per user are initiated upon
connection. While the overall underlying environment may change
later, this
particular yardstick is what it is. As I mentioned above, it's been
used on
quite a number of platforms.
I started by trying to run this benchmark with several hundred
users and as
a result got the system completely wedged, responding to anything
with the
message that it can't fork. At that point (upon reboot of course),
I found
sysctl values and changed them to the maximum (2068), which did not
help.
Then I detected the administrative limits - 100 and so on. The
important
point is that I got the system wedged with as few as 8 users
logging in
(remember there are a few processes per user that get created).
Given that
there were over 50 other processes already running, I assumed that
somehow I
am running against this 100 maxproc limit with no visible means to
increase
it in this context.
Now the newly reformulated questions:
1) Am I indeed running against the admin maxproc limit or is it
something else? (but see the P.S.)
I don't know. I'd have to see a live example of the failure mode in
action in order to be able to tell you.
The most common cause of "cannot fork" is a parent process which
fails to reap its children properly. A quick way to check if this is
the case is to write a shell script that runs the output of ps into a
series of files on a remote NFS volume, and stop-it/reset-the-machine
once the machine hits the wall. Something like this:
SAMPLEDIR="$1"
if test "${SAMPLEDIR}x" = "x" -o "$2x" != "x"
then
echo "Usage: capture.sh <data capture directory name>" >&2
exit 1
fi
while true
do
for tens in 0 1 2 3 4 5 6 7 8 9
do
for ones in 0 1 2 3 4 5 6 7 8 9
do
ps -gaxlwww > ${SAMPLEDIR}/sample.${ones}$
{tens}
sleep 5
done
done
done
This will let you check for zombied processes at the time the machine
hits the wall, and the PPID will tell you the PID (and thus the name)
of the offending parent process failing to wait() the children to
reap the zombies, if it is in fact zombies.
It may not be that; it may be something else. I'd need to see an
example. If you have an example, the best way to submit it is to do
the bug report: issues from customers have higher priority than if I
originated the problem report myself (or I'd be able to work on
anything I wanted to work on by writing my own list of things to work
on 8-)).
2) Will creating /etc/launchd.conf (your answer number 1) with the
increased values address this problem? (what is the format for
specifying
hard and soft values there?)
The format is in the manual page for launchd.plist, which is
referenced by the manual page for launchd.conf. It's basically XML,
and there's a short example with different keys than you'd need to
use at the bottom of the plist manual page, but it should be pretty
straight forward.
From the previous reply: there's a deficiency (IMO) in launchd that
prevents you from being able to specify arbitrary sysctl name/value
pairs to launchd, so this will only let you increase (#5) and (#6)
(the hard and soft administrative limits) up to the default compile
limit (#3). If/when this is addressed, you will be able to increase
this (via (#4)) to the compile limits (#2).
Right now, the only way around this would be a "helper" applciation.
3) If the answer is no for the above, are the rest of your answers
applicable to this situation?
The first answer was "I don't know".
I don't know if you'd count the previous answer as a "no". If you
do, you can get around the limit there by providing an SUID root
helper application from which to launch telnetd, *after* it sets the
administrative limits in (#5) and (#6) to the hard compilation time
limit (#2). You can either call the sysctl in the /etc/rc file, or
you can roll it into the helper applciation.
In the case that you want to avoid the helper application, you could
do this in telnetd itself; the sources for telnetd are on
opendarwin.org, and it's run as root by the program "launchproxy", so
it has the rights to increase it's own limits.
If you recompile it, you will need to replace it in /usr/libexec/
telnetd.
As an educated guess, I'd say that we aren't going to get as good a
set of numbers out of MacOS X as, for example, SVR4, which instead of
fork()ing a telnetd for each incoming connection, uses what's called
a port monitor, and pushes a tty line discipline module onto the
streams stack and directly forks login onto the resulting port. This
basically means they have half as many processes per connection, and
they avoid the loopback overhead from the pty driver (among other
optimizations).
It actually would not be that hard to adapt this method (wiring the
tty line discipline to a socket to make it look like a pty to the
inbound socket, even without streams support) for MacOS X, but
frankly, we've never considered telnet as something in need of
optimization, given that it's generally a relatively slow human on
the other end of the wire, rather than a benchmark or a fast computer
program.
Not directly related to my questions, but painful nevertheless, is
the fact
that my mechanism of connecting and funneling commands to my Mac
system has
deteriorated since yesterday (it's RTE - remote terminal emulation
software
which telnets to the remote node and sends a sequence of characters
across
the connection in response to appropriate prompt patterns from the
remote
system). Currently, unlike the day before, it sends enough to
provide ID and
password and login, but then the communication is broken (I suspect
the
prompt patterns don't get delivered properly). I am describing it
on the off
chance that this may be something obvious. (I should mention that I
am new
to Mac and also that that same procedure still works fine for other
platforms; I can successfully telnet manually)
I have no idea. It may be that the connections are in raw mode, or
it may be that the slave side has not been properly dropped and you
are still running things in the background which are ignoring SIGHUP,
or your test program expects SystemV instead of BSD signal behaviour
with regards to the controlling tty when it's not explicitly set via
setsid() (i.e. your test process expects a HUP, but is instead in a
loop reading 0 bytes over and over instead of exiting when it sees a
0 length read in a BSD-like fashion), or any of a dozen other things.
It could just be that your RTE is on a platform that doesn't properly
perform a socket shutdown because it's sockets implementation is in
user space, and you are effectively staging a DOS attack against the
machine by virtue of not shutting down the connections properly (i.e.
if you did a netstat, you might see a lot of socket connections
sitting in FIN-WAIT-2 state because the other side reset the
connection istead of sending the final FIN because the connections in
user space are not properly resource tracked - Windows had this
problem with their WINSOCK2 implementation for some client programs,
or when the client machine crashed, for a long time; so have other
platforms).
The best way to deal with this is if you could provide a cut-down
minimal set of code that reproduces the issue, and then file a bug
including reproduction instructions. This is not a problem which
would be likely to be routed to me in any case, since it sounds like
a user space problem.
If a "netstat -a" does show a bunch of FIN-WAIT-2 state connections,
you can reduce the timer via sysctl, set socket keepalives, or do
what Apache did or the problem, and modify telnetd.
Again, telnetd is probably not the best benchmark approach.
Best regards,
Yefim
P.S. While I was writing this, an email from you came in which the
limit on
the number of ptys (128) is mentioned. Could this be the limit I am
running
into, and if yes, can this be bumped?
It shouldn't be, unless one of the other potential problems with the
software being run and listed above is being exercised and keeping
the pty's busy or you are trying for more than ~128 simultaneous
connections. As long as you don't have that many connections, you
will be fine.
Right now, the number of pty's can't be bumped above 128. Taking the
tty line discipline approach or adding an alternate pty driver (just
copying the code in bsd/kern/tty_pty.c and renaming everything would
get you another 128, but you'd need to teach telnetd about the new
names) could raise the limit for you (the line discipline approach
would raise it to the connection limits, which are effectively the
open file limits).
Since programming things like that are what this list is supposed to
be about, someone here could probably help you with that, if you
wanted to grab the kernel sources off of opendarwin.org. 8-).
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden