Re: Resident Memory, Private Memory, Shared Memory (top, Activity Monitor and vmmap)

1 Dec 2007

      site_archiver@lists.apple.com
Delivered-To: darwin-dev@lists.apple.com

On Nov 30, 2007, at 12:04 PM, darwin-dev-request@lists.apple.com wrote:
I have only searched Apple's mailing list archive and I have not
found anything; Apple has too many ifdef __APPLE__ in the kernel to
say for sure that xnu behaves like any other BSD kernel.

This is, techically speaking, not true; it is documented by the
(freely available) source code and the activity of your system.

Well, source code is no real documentation. The difference is, it
took me about 1-2 hours to figure out how top calculates private and
shared memory, something a documentation had told me in 2 sentences
in 3 minutes.

1. The Resident Memory Size (aka Real Memory)
2. The Private Memory Size
3. The Shared Memory Size

Because all memory a process "has" (all pages in a process address
space) are either shared or private.

So the real memory a process
used should be the sum of both. That the private and shared are
calculated in such a unfavourable fashion is another thing.

"resident memory" is an old OS term, and you will find it defined
in many OS textbooks.  If you have reached this point, it's more or
less assumed that you will have read Tanenbaum or something more
recent, and the concept should be self-explanatory.

Resident according to top's own explanation is:
'This column reflects the amount of physical memory currently
allocated to each process. This is also known as the "resident set
size" or RSS. A process can have a large amount of virtual memory
allocated (as indicated by the SIZE column) but still be using very
little physical memory.'

Which already makes a bit less sense, but it's no issue, since COW
memory with only one reference is still accounted towards private
memory by top (and private memory will only have one reference).

and MAP_SHARED is erroneously always accounted as shared even if
nobody else has that range of the object mapped.

OTOH drawing
buffers of Cocoa apps are in fact really shared (between the
application and the window server, that's what I saw on some
technote) and still saying "they are shared" seems awkward, as it's
technically correct, but these buffers only exist because of the
application and without it they will also vanish from the window
server, so they could as well be accounted towards that application
only.

Top is not telling you, except in the crudest sense, how much real
memory your process uses.

Then what is it telling me? If the values are so useless for
everything, then why displaying them in the first place?

Note that Activity Monitor (which displays the same values as top,
but I have no source to back this up, I can only compare them) is no
developer tool. It is used by normal, simple minded users, that have
no idea what "virtual memory" actually means. And this utility
displays real, private and shared memory to the user. What do you
expect will a simple minded user think, when seeing these values?

Based on what I'd call reasonable experience, I'd say "not very much".
Simple minded users are using these values to determine memory
consumption of a process. Now if these values are so far away from
reality, they are useless and should not even be displayed to simple
minded users. Instead, this all should be replaced with a single
value, that really can give the user a rough estimation of how much
memory this process needs.

Understanding the real memory usage of your task is better handled
with different tools.

Now it gets interesting. That would be which tools?

^^^^ (edited X-Ray to the shipping name "Instruments")
  instruments to see if you can't get a better understanding of what
your process is doing and what that costs.

These tools basically tell me how much memory I have "wasted" using
malloc. Even in the best case, they might take my stack into account.
But what they won't take into account is the memory lost by caching
code pages, the memory lost by loaded libraries and their code pages,
and so on. And in rarest cases they will even distinguish private and
shared memory at all.

So if I know my processes uses malloc to allocate about 800 kb of
memory during runtime, this could not be any further away from the
real number of memory the system loses by running my process as it
is. This is an approximation of memory usage, that is far, far away
from reality and a much worse approximation than all the values
mentioned before.

The number I'm looking for is:
How much memory does the system need, to keep my process running,
assuming it was not allow to swap anything of it (everything needs to
be in memory) and here it should take any memory into account that
belongs to my process directly or indirectly, which includes malloc,
stack, code, static memory, code of libraries and static memory of
libraries, coded needed by the dyld - and that clearly distinguishes
between shared and unshared memory.

It is a very difficult number to compute for a variety of reasons:
 = Mike
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwin-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl...

You're making an error here in assuming that Darwin is a "BSD kernel"
in the first place.
The MacOS VM, which is what you're currently being confused by, is a
distant descendant of the Mach 2.5 VM, which is where the free BSD's
started as well, but it has evolved in very different directions as
the operating system itself has had very different motivating
factors.  Other than in the most general sense, it is not very useful
to compare them.
It's also nowhere documented.

The code is (by definition) a correct and complete description of what
the system does.  It is the ultimate and most expressive
documentation, although it's not so good at describing intent.
So far, all these values seems self explaining... but they are not
at

all for one single reason:
How comes 1 + 2 != 3???

What makes you think that they would be?

This depends on your definition of "shared" and "private"; you are
here defining them as complete and exclusive but that is presumptive
and in fact not correct.

You asked "why don't resident (1) plus private (2) equal shared (3)",
not "why don't private (2) plus shared (3) equal resident (1)".

Top is a multi-platform utility with a long history; this definition
predates modern VM systems and is necessarily simplistic.  However,
it's not fundamentally in conflict with reality, and had you read this
before making your earlier comments you'd have been less confused.
MAP_PRIVATE is handled as COW

If you've checked the code I'll take it on face value, but I'm not
certain that a COW mapping against an object is actually cross-
referenced against other COW mappings (due to the interplay with copy
objects and the extreme cost involved) to determine whether the range
in question is practically shared or not.
In a pedantic interpretation, COW pages are only 'shared' if someone
else has a mapping for that page, otherwise they are merely 'shareable'.

That is a bit stupid, since it does not reflect reality.

As I note above, the alternative would require an exhaustive search of
the arbitrarily large set of possibly overlapping mappings.
It would help this conversation if you would stop saying "stupid" when
you really mean "I do not understand".

Whilst this is basically an erroneous digression, it's a good place to
hang a point.
You're making a fundamental mistake in thinking that any of this
accounting is being done for your (the application developers') sake.
The VM keeps accounting information for its own purposes, and the
statistics that are maintained and tracked bear for the most part
directly on how the implementation works, and what it needs to know in
order to provide the services it does.
As a consequence, the VM is entirely ignorant (and rightly so) of the
application or framework level semantics applicable to the interaction
between (say) a client and the window server.  That interaction uses
Mach shared memory semantics (amongst others), and gets the same
behaviour that anyone else using those interfaces gets.
It matters not that you-the-developer think that the window server
"doesn't count"; from the perspective of the VM and the system as a
whole, for whom that information is maintained, it does, and thus the
numbers reflect that.
As I've previously noted, if you want to look at things from an
application-centric perspective, you should be using tools that are
designed for that purpose.

That's a good question, but well outside the scope of this
conversation as it strays into marketing territory and the somewhat
conservative attitudes that many old-school Unix administrators have
towards tools.

That number doesn't exist.  Computing the working set size of a given
collection of threads and their dependencies is more expensive than
the system can afford at runtime, when for the most part such a gross
number is not useful or interesting as such.
I would encourage you to tinker with Shark and Instruments

You are trying to force your perception of the way the system uses
memory onto it and the tools.  You might benefit from stepping back
and taking in the way that the tools talk about the system, as they
are the product of many years of experience tuning applications, and
as such they tend to reflect the things that have previously been
worth looking at.
Knowing how a given system reacts to what your application is doing at
a given time can be entertaining, but it's much more interesting to
know how systems in general will react when your application is
running, and this abstract view is more generally useful.

If I assume by "loses" you mean "uses", then the number varies wildly
based on what your application and the system are both doing at the
time.

What you are asking for is what is generally referred to as the
working set size for your application.
 - There are adaptive algorithms in play with both time- and space-
related parameters that will cause your working set to expand or
contract based on a number of factors not excluding CPU speed, other
thread activity in the system, actual physical memory, disk speed, etc.

 - There are other tasks in the system (server processes) that need
to make timely forward progress in order for your application to do
likewise, and their working sets for both your application and other
applications' work that may gate yours (and thus other applications'
working sets either partially or entire) need to be considered.

 - The working set for an application can vary wildly based on the
configuration of the system it's running on, the user-set preferences,
the document or data being worked on, and so forth.

 - The number varies wildly with time; some applications' working set
size relates fairly directly to what the application is doing at the
time, or has periodic aspects that relate to a long-running task, but
for some applications it is grossly impacted by external or non-direct
factors that don't correlate to the application's activity at all.
Because this number is very difficult to derive, and because deriving
it at most tells you about limit conditions, it is generally better as
a developer to focus your attention on the factors that affect the
number and which are a consequence of your application's behaviour
instead.  These are generally easier to identify, and since they're
something that you can do something about, a better place to start
anyway.
This email sent to site_archiver@lists.apple.com

Michael Smith

tags

participants (1)