Re: what is expected malloc behavior re. limits/insufficient resources?
Re: what is expected malloc behavior re. limits/insufficient resources?
- Subject: Re: what is expected malloc behavior re. limits/insufficient resources?
- From: Terry Lambert <email@hidden>
- Date: Sun, 19 Jul 2009 16:03:20 -0700
Why is this addressed as a response to me?
-- Terry
On Dec 25, 2008, at 3:00 AM, IainS <email@hidden>
wrote:
Thanks Terry, for your helpful response,
Merry Xmas all round... but just before we head off to over-
indulge..... I would welcome your comments on a couple more points.
On 25 Dec 2008, at 09:03, Terry Lambert wrote:
On Dec 24, 2008, at 3:05 PM, IainS <developer@sandoe-
acoustics.co.uk> wrote:
On 24 Dec 2008, at 22:47, Terry Lambert wrote:
This can exceed total RAM plus swap because of memory overcommit,
which has pretty much been SOP on unix since the DEC VAX version
introduced virtual memory.
I believe that there are two distinct kinds of overcommit:
Let's define (for the sake of illustration) available = [ram +
free swap - wired] ; overcommit as "anything > available".
Kind 1/ with multiple competing processes (**all of which which
would individually fit within 'available'**).
A suitable and successful strategy is to serialize the access to the
resources by making processes sleep when a memory demand cannot be
met. The assumption is that processes will eventually complete and,
although the system might be sluggish owing to swapping, it remains
accessible. Some form of prioritization allows at least one process
to proceed - such that a deadlock doesn't occur. Everyone's happy.
Kind 2/ Where the system allows a single process to overcommit.
In this case, assuming that the process uses the allocated memory,
(and without a grim reaper watchdog process) this will eventually
result in a locked system (the amount of time will depend on the
offending process' priority).
(a) the offending process cannot complete - it will always end up
sleeping whilst waiting for some unavailable blocks (which can never
be made available, because the disk is full).
(b) most likely on the way to this the majority of other processes
will be swapped out - and cannot be swapped back in because there
is no more backing available for the RAM.
(c) in the end all processes that rely on allocation of memory will
be asleep - whilst the system has not "technically" crashed - it is
actually in a vegetative state and we might as well switch off the
life support...
(d) anything that needed space on the root partition (including all
system and user databases in a default installation) has probably
lost data.
The problem with this second scenario is that you've just described
the halting problem by putting forth the implied question of how such
processes can be recognized. Will the process complete? We have no
idea. Maybe a timer will fire and release resources. We can't know
this merely because there us an outstanding unfired timer somewhere in
the system, because there is no cause/effect coupling annotating the
timer when it was enqueued. There are dozens of high probability
events like our putative timer, and the number of actual events that
might release resources goes up as the probability of them being
coupled to implied resource reservations goes down.
Unfortunately, computing hasn't been about Djikstra's algorithm and
"buy a big enough machine" (one with sufficient resources for all
contingencies) since at least 1978.
They mostly did it by having a 32 bit address space, twice as much
preconmitted swap as physical RAM.
Indeed, I remember this well as a rule-of-thumb. Although,
actually, the problem does affect older OSX 32 bit machines with
smaller disks - if the same criteria are met (insufficient backing).
Older OSX had this issue, as many modern OSs, including Windows, now
do, because swap was not preallocated, and disk space for it was used
as another form of overcommit: if I'm not going to use that much swap,
why should I permanently lose 64M of disk space over it?
This is more or less market-driven, just like the changes in
filesystem technology to eliminate the free reserve of 15% in UFS,
which was there to avoid disk fragmentation by reducing the hash-fill
to Knuth's 85% (cylinder group allocation in UFS was effectively a
hash operation). People want to be able to "use what they paid for",
without having to understand that they _were_ using the space -- just
not for storage of their data.
So overcommit.
If that didn't work, then thr approach which was commonly used was
to kill programs in order to recover resources.
the problem we have here is that it's frequently the case that the
system has become so sleep-bound that you cannot obtain an interface
to do this - and in any event, there's no automatic way to recover -
and a typical non-command-line User is not going to be able to
succeed.
Not really. The system killed the process as a result of tripping into
the fault handler in the pager. Kernel resources are (with her rare
exceptions) never overcommitted, and so the fault handler and kill
code gets to run. AIX did this. Later it added the ability (via
signal registration) to mark some processes as "precious", which it
would avoid killing if possible, to prevent what would otherwise turn
into a reboot.
What you'd really like to do is kill the process that caused the
shortage in the first place. But is that the ray tracing application
taking up 95% of the resources that has been running two weeks, or is
it the 6%-using web browser you fired up to surf news while waiting
the last 20 minutes for the ray tracer to complete?
It's even possible that Activity Monitor would fail to quit the
process since it might need memory to pop up the "force-quit, quit,
cancel" dialogue.
We're not talking about a user-controlled kill, which has the problems
you note, unless you take all the needed code between the mouse and
the kill system call put of the overcommit realm. We are talking
something a lot more brutal and low level, gurarnteed to keep the
system viable. But maybe at the expense of two weeks of ray tracing
due to be sent out for green screen edit in the next 3 days.
You can also opt a program out of overcommit several ways, but you
typically have to replace the malloc, or not use it. Most of these
boil down to forcing swap to be preconmitted by dirtying the pages
or locking them into physical memory. This is usually inconvenient
for both the programmer. Also the other processes sharing the
system have to do without those resources, even if they are not
being actively used by the greedy precommit process.
hm. I wouldn't ask to set aside resources for this - the basis of
my thesis here is that there is no point allowing a process to
(actually) overcommit (I accept that virtual overcommit is useful)
-- somehow one has to draw the line when paging space runs out.
This assumes it's a measurably finite resource. What happens when you
see the system headed towards thus (or in the middle of it), and plug
in another disk to help out? Your line just moved.
I suggest filing a problem report.
I'll do this and copy the radar number here
BTW: in the case that a kernel extension is stupid enough not to
wire its memory ... this can also cause a panic (but since the panic
is probably down to a 3rd party extension I would not expect it
receive much attention).
It pretty much has zero choice in the matter, the memory comes to it
wired.
(a) Is malloc() supposed to honor ulimits?
(b) Is malloc() supposed to allow requests in excess of
available system resources?
Is there any official answer to these?
This isn't an official support channel. To get an official answer,
you'd need to file a problem report.
Here are my answers, though:
(a) No. Currently the limits are voluntary. For example, gcc uses
getrlimit() to look at the limits and voluntarily limits its
working set size by choosing not to exceed the limits.
Hard limits can be set by system admin (in my admin days 'normal'
users had to request single-user access to machines if they wanted
to do stuff like this).
Soft voluntary limits would actually solve my problems - so long as
malloc honors them.
It's up to the caller of malloc() to honor them. The malloc() call
itself is a user space library call that has no idea of how much total
virtual address space is availble (disk+RAM; a poor measure anyway,
since you are competing with other processes, and disk can fill for
other reasons). In addition, it uses Mach calls to allocate anonymous
pageable memory, which means it would bypass any resource limits
established in the BSD layer in any case (setrlimit).
A possible strategy to mitigate this is to precommit swap space for
all anonymous physical allocations and all shared COW allocations on a
per-process basis, "just in case", but if we do that, we are in the
boat of telling people to buy bigger disks for their expected
workloads, and the previously mentioned sparse utilization algorithms
are out in the cold, unless you potentially have terabytes of disk
available.
It seems to me that it is easier to just fix the broken software
causing the problem.
(b) Yes. Allocations that it makes are for virtual, not physical,
resources. Virtual resources are effectively near-infinite, and you
could be using an algorithm that has sparse utilization of the
virtual resources. If so, then there's no reason your algorithm
shouldn't be allowed to work, due to an arbitrary administrative
limit enforced against a virtual resource.
Your argument is good - and I agree it's challenging - but one ought
to be able to find a way to "bail out" in the case that a process
does not make sparse use of the resource. Perhaps an overcommitted
process could be marked as "do bus error, instead of sleeping when
backing runs out" (which is similar to one of your suggestions, too).
Thus conflicts with intentionally large utilization by an important
process getting shot in the foot by an unimportant small utilization
pushing it over the top (e.g. some fool firing up a browser on a
machine in a render farm because they are too lazy to walk back to
their office),
Typically these are one user or single purpose machines.
hm .. I run Mail, iTunes, and numerous other background processes
which all suffer database corruption if the system blocks this
way. OK, it's not a DEC20 timesharing .. but., "single user" is
stretching things a bit these days ;-)
Yes, but an application in that environment that eats 2^40 bits of
virtual address space on you is either broken or not intended to be
run with the rest of your workload at the same time.
Back in my admin days, it was also possible to do this on multiuser
machines, and our typical reaction was to disable the offending
users account. For a single user machine, you could simply
reinstall the offending software and contact the vendor for a fix.
I can honestly say that this never happened AFAIK on the Vax, Primes
and Suns we used (and I've worked in R&D all my life with people
persistently trying wacky things :-)
; normal user accounts had restrictive limits. (I do not say it's
impossible, just that I don't believe I ever saw it happen ).
We tended to do it on purpose, at least on role-based systems. The way
you tune a traditional UNIX is to (1) remove all administrative
limits, (2) load it up til it crashes, (3) set the limits just below
that.
For a general purpose (shell account) machine, you (1) set
ridiculously low limits, (2) played BOFH if anyone complained.
Neither one's a good option for a desktop machine.
If it cannot be regarded as a bug, perhaps honoring ulimits or
providing RLIMIT_VMEM would be a very useful enhancement?
You can file a problem report, but we are unlikely to add an
RLIMIT_VMEM for the same reason top can't give you an exact answer
on who is responsible for shared pages: cardinality hides this
information, even from the OS itself.
perhaps then, the desideratum is unachievable (preventing a single
processes from over-committing).
Limiting your processes utilization of the available (by virtue of
addressable bits) virtual address space with RLIMIT_DATA is much
more likely, but that would be either voluntary or up to its parent
process to set before the program is started.
as I said above, admin can set hard limits for RLIMIT_XXXX (I know
there's no absolute guarantee - but one can make things considerably
more robust). If you choose to raise the limits for a single
process or user -- least you then enter the territory with eyes-
wide-open (and Mail and iTunes and anything else with a db shut
down... )
Admins do this by controlling the limits for the processes after fork
and before deescalting privileges to do the exec. You'd likely be
unhappy with the results in a GUI environment, for a lot of reasons.
Gamers would outright hate it.
For a responsible user testing code which is intended to stress the
system, voluntary limits are entirely satisfactory -- providing they
are honored by malloc.
have a great Xmas,
Iain
Merry Christmas!
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden