Re: Hard-limits for kern.maxproc
Re: Hard-limits for kern.maxproc
- Subject: Re: Hard-limits for kern.maxproc
- From: Terry Lambert <email@hidden>
- Date: Thu, 29 Jan 2009 19:46:59 -0800
Sorry, the send key is too close to the scroll region in iPhone mail.
Let me finish my response...
On Jan 29, 2009, at 6:36 PM, Terry Lambert <email@hidden> wrote:
On Jan 29, 2009, at 11:15 AM, Nathan <email@hidden> wrote:
After a frustrating and long search for the root of the various
problems I was having with my new Xserve serving email via IMAP, I
was
tipped off by...
http://support.apple.com/kb/TS1659
...that the kernel has insanely low defaults (for a server) for
kern.maxproc (532), which controls the maximum amount of processes
allowed to run on the system. While trying to raise the default to a
reasonable value (Linux defaults to 32,768 for example), I discovered
that I couldn't raise it above 2500. Peter O'Gorman tells me off
that
this is a hard-coded limit in Xnu:
./bsd/conf/param.c:#define HNPROC 2500 /* based on thread_max */
I've already filed rdar://6536876 asking that this be raised, but as
I've had very poor returns with my filed bugs that I spent a long
time
researching, and very good returns on Apple mailing lists (go
apple-x11!), I thought I'd post here too. :-)
Is there any reason the hard limit couldn't be raised an order of
magnitude or two?
Yes. Fragility under resource starvation conditions. This has
already been asked and answered in this mailing list previously
(search the archives for "resource limits"), but the tuning values
you specify for a general purpose system have to be lower than those
for a role based system, since a general system has to be able to
handle all the resources if every type being consumed. Role based
systems use a set of resources whose type is constrained by the
role. In trade for tuning down (or simply not using), and
administrator may safely tune up the resources used in that role,
making the system a better fit for the role.
Resource limits are tuned
Resource limits are tuned by default for general purpose use, and we
clearly document how to raise limits to their containing limit
boundaries (limits are hierarchical). For maxproc, that's:
Compile time hard limit (initial value for the kernel variable that
constrains the max sysctl kern.maxproc value). Changeable by
recompiling your kernel. Sources are provided through <http://opensource.apple.com
>.
Runtime hard limit (kernel variable that constrains the max sysctl
kern.maxproc value). Settable by either changing the compile time hard
limit, or at runtime by enabling /dev/kmem and compiling a program
which links against libkvm, and changes the value in the running
kernel. Sources for libkvm are provided through <http://opensource.apple.com
>.
Runtime sysctl limit. This limit manipulates the value of the kernel
variable 'maxproc', with a top value of the Runtime hard limit.
Rlimit hard limit, whose top value is the runtime sysctl limit; for
servers run from launchd, this limit is set in the launchd plist for
the LaunchService in question.
Rlimit soft limit; for servers run from launchd, this limit is either
set to a value up to and including the Rlimit hard limit, either via
the setrlimit() system call from within the server code itself, or in
the launchd plist for the LaunchService in question.
So basically, you need to:
1) recompile your kernel OR write a small utility to change the kernel
variable at runtime and call it from /etc/rc.server. Doing this voids
support, since if you do it to a general purpose server, you allow
more resources than exist to be consumed, run out of kernel virtual
address space, at which point you crash (administrative hard limits
exist in the first place to prevent crashes).
2) crank up the sysctl limit to the new runtime hard limit
3) modify /etc/launchd.conf to increase hard limit (but not soft
limit) inherited by child processes of launchd
4) modify the plist for the server in question to up its hard and soft
limits
Doing this will let you crank the numbers up on your role based server
to the point things crash, and then you can back off a bit to prevent
crashes.
Even with raising kern.maxproc (and
kern.maxprocperuid and the launchd settings that mirror it and
launchd's maxfiles setting, etc. etc) my Xserve will run into the
process limit far before it exhausts CPU, RAM, disk, or bandwidth
resources.
Incorrect. The kernel is 32 bit which inherently limits its available
virtual address space to 4G. Of this, for its own internal structures,
the kernel itself is limited to 1/4 of physical RAM or 1/4 of the
available space (1G), whichever is less. This leaves space for I/O
such as video memory (the biggest customer), cached, and so on.
I'd be happy to provide some legwork (testing, doing some sanity
checks on stuff in the kernel source code, etc.) if that would help.
I am a registered ADC member and I'm willing to get my hands dirty to
get this fixed.
You have three fundamental problems with your configuration, none of
which will be permanently fixed by increasing the number of
processes. At best, you will get a latency reprieve between reboots.
Two of these are client problems, so the only thing you can do for
them is increase the interval between reboots being required by *any*
server you get from any vendor while continuing to use those clients
(you do that by throwing resources at the problem).
1) Cyrus uses a process per client connection. It could use threads
instead (which would be best), or it could use a process per client
and route in-bound sockets to the clients server instance using
descriptor passing over UNIX domain sockets. The second is less
satisfying, since you still eat a process per client instead of having
only one process, and still eat a descriptor per connection, but for
the number of clients you have, this would solve part of the problem
for the foreseeable future.
2) Your mail client is incorrectly designed. The IMAP protocol was
specifically designed to allow session multiplexing by transaction ID
over a single connection to the IMAP server from a client. At best, a
"perfect" IMAP server would waste a descriptor and a session state
structure per open mailbox per this type of client in a single process
(this assumes the server was designed to use FSA instead of threads as
session containers). A somewhat less perfect server would waste a
descriptor and a thread per open mailbox per those type of client. An
average server that had some overriding security partitioning
philosophy (maybe rather than fixing buffer overflows, they wanted to
limit the damage to the client doing the overflow) would waste a
process per client and a descriptor and thread per connection for this
type of client. Even a server that burns a process per connection
(like cyrus) would not burn that many processes, if the client
understood the IMAP protocol well enough to properly mulitplex
sessions over a single connection.
3) Your mail client behaves incorrectly when it can't get a response;
it opens another connection without closing the first one. While a
perfect mail server would see this happen and kill off the previous
connection from its end, that's not happening here, and the degenerate
behavior of the client petmanently (until reboit) costs you a process
per polling attempt. Unfortunately, since a perfect mail server can't
tell a second connection by an imperfect client from a reconnection by
a perfect client, with your non-session multiplexing client, making
this change to the server would cause the imperfect client to lose
connections on its other mailboxes. So this is unlikely to be amenable
to a server-side fix, since causing a client to behave correctly is an
intractable problem, from a server perspective.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden