Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Resident Memory, Private Memory, Shared Memory (top, Activity Monitor and vmmap)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Resident Memory, Private Memory, Shared Memory (top, Activity Monitor and vmmap)

Subject: Re: Resident Memory, Private Memory, Shared Memory (top, Activity Monitor and vmmap)
From: Michael Smith <email@hidden>
Date: Tue, 27 Nov 2007 13:53:57 -0800


On Nov 27, 2007, at 12:04 PM, email@hidden wrote:

I'd like to ask a simple question.


It is not a question with a simple answer.

This question has been asked about
ten times on various mailing lists here in the past and whenever
somebody asked this question directly or indirectly, he never
received a reply.

This is not true; I have myself authored at least a dozen replies to variations on this basic question in multiple forums.

Either the question was ignored in the reply or the
post never received any reply in the first place. This either means
nobody wants to answer it or nobody can answer it... maybe nobody
knows it, which would be very, very sad, as that is one of the most
fundamental things one should know.


You are assuming that there is a simple answer, which there is not.

It's also nowhere documented.

This is, techically speaking, not true; it is documented by the (freely available) source code and the activity of your system.

I even searched many MacOS internals books for an answer, but I had no success. It's simply not explained anywhere. It can't be that we have to file a support incident just to receive a reply to this very simple question. So I try it again and I will try it over and over again, till I find someone who can answer it.


Now you're just being silly.

When you use top or Activity Monitor, does not matter, you receive
three interesting memory results per process. The VM stats are not
interesting (no matter if VM private or VM total). The three values
of interest are:

1. The Resident Memory Size (aka Real Memory)

2. The Private Memory Size

3. The Shared Memory Size

So far, all these values seems self explaining... but they are not at
all for one single reason:

How comes 1 + 2 != 3???

What makes you think that they would be? The numbers are in no wise directly related.

Perhaps the numbers aren't "self explaining" in the sense that you think they are?

There are processes, where 1 + 2 is much bigger than 3. How comes?


Because they aren't related in that fashion.

Where is the memory lost?


What makes you think it is lost?

Which pages count towards 1 or 2, but are ignored for 3?

So now you admit that you don't actually know what those three numbers mean. 8)

Allow me to shed, again, a little light. First - it is important to understand that Darwin has many, many categorisations for memory, and the numbers that are thrown into these three categories are in some cases approximations. Understanding the way memory is used by the system is a full-time occupation.

1. Resident Memory size: this counts (to a first approximation) physical pages that are currently mapped into the task address space.

2. Private Memory size: this counts (to a first approximation) pages that are private to the task (not shared with other tasks)

3. Shared Memory size: this counts (to a very coarse approximation due to several factors) pages that are shared with one or more other tasks in the system.

1 counts physical pages. I am assuming that with 2 and 3 you are referring to the RSHRD and RPRVT fields, which are resident shared and resident private pages.

You might more reasonably assume that 1 = 2 + 3, but you'd still be bitten by the fact that things are not counted in quite the way you expect.

Is this really so hard to answer?

Yes, because the above (and those three numbers) are crude approximations. You might try introspecting a task's address space with vmmap and back-tracking some of the objects to see how complex things get.

I looked at the source of top, here's what I found out (some people
might find this useful):

A) The resident memory is obtained using the call task_info(). Since
it is nowhere documented what this call will count towards memory of
the task and what not in the field resident_size, it's basically
magic and I can't investigate this value any further.

"resident memory" is an old OS term, and you will find it defined in many OS textbooks. If you have reached this point, it's more or less assumed that you will have read Tanenbaum or something more recent, and the concept should be self-explanatory.

B) The other two values are obtained by getting every memory region
of a process using mach_vm_region() (same function vmmap uses) and
show some interesting behavior. Basically all private memory is
summed up as private memory. Real shared memory (e.g. mmap memory) is
summed up as shared memory. This is expected behavior.

Actually, this is an approximation; I think you'll find that memory mapped MAP_ANON is accounted as private, MAP_PRIVATE is handled as COW and MAP_SHARED is erroneously always accounted as shared even if nobody else has that range of the object mapped. It's been some years since I looked at the code, so YMMV.

Memory that is
in theory share-able (Copy on write) is treated differently depending
on whether it is actually shared. E.g. code of the process or
libraries used by the process is counted towards private memory if it
is only referenced once (it could be shared, but currently it isn't).
That means libraries shared with other processes are counted towards
shared memory. This is indeed very clever. If I start a second
instance of my process, my code regions are shared between both
processes, so the private memory decreases and the shared one raises,
which reflects what really goes on within the system.

It's kind of you to call shared libraries "clever", but see above inre: OS textbooks. They've been around a long time. 8)

Same with some
library code. E.g. libraries that only my process uses are first
counted towards my private memory. With a second instance using them,
they are counted towards shared memory. You see, pretty clever and
desired behavior.

You are completely missing the shared segment, and the interesting and evil things that relate to it here.

C) Only "touched" memory is accounted at all. This is also very
clever. Just because I load a library does not mean, that all library
code is in memory. Only space for all is reserved. Unless I ever
access any page of that library, the page is not really in memory.
Same for my code or malloc'ed memory. This is also desired behavior.

This is again an approximation. In the case of shared submaps or shared pmap regions, pages may be resident for a task that has never referenced them. Clustered pageins can also cause this behaviour.

D) Still top overestimates the real memory my process uses.


Ok, so now we come to a much more interesting mistake you're making.

Top is not telling you, except in the crudest sense, how much real memory your process uses.

It gives you some ballpark numbers that are indicative of the memory currently resident and mapped into your task.

This can be a very different number.

Understanding the real memory usage of your task is better handled with different tools. VM accounting is a very expensive way to do what you seem to be wanting to do; I would encourage you to tinker with Shark and the X-ray instruments to see if you can't get a better understanding of what your process is doing and what that costs.

E.g. I
use a library that another process uses, too. I only need one
function of the library. If I was the only one using it, only the
page containing this function would be real memory, that means only
one page would be accounted towards my process (the rest stayed
virtual).

There would be several pages most likely referenced due to the way linkage works. It's also likely that if the framework you referenced was a system framework that it would already be mapped (due to being in the shared segment) and quite probably referenced.

However, if this was the first time you had touched the library, the library might want to run its static initialisers, which would make other code pages resident and possibly create private data pages containing the results of this initialisation (or it might COW parts of the libraries data segment).

Now the other process uses all of the library. Because of
this, all pages of the library are loaded to real memory. Since I use
the library, too, now all the loaded pages are accounted to my
process as shared memory, just as they are accounted to the other
process. In fact, only one of the pages is in memory because of my
process and in use by my process. The rest is only in memory because
of the other process. So more shared memory is accounted towards my
process than my process actually uses. But since it is accounted
towards shared memory, this is something I can live with (forcing the
kernel to remember because of which process which virtual page has
been mapped into physical memory would be pretty much overkill and
bloat all memory structures enormously).

It would be impossible to reasonably account for this activity without abandoning shared text/data altogether.

What I'm missing
is a tool that works just like vmmap, but reports statistics about
every single *PAGE* of memory of a process space, not just every
larger region and that takes into account if this page has been
copied or not for a COW page.

It's called gdb. Attach it to the kernel, and start walking your task's address space. 8)

I do think, however, that you are starting down a very difficult and ultimately unrewarding path. Some of the information you want is simply not available, and other pieces are obscured because of the disconnect between what your application does and how the system actually handles that activity.

 = Mike

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: Resident Memory, Private Memory, Shared Memory (top,	Activity 	Monitor and vmmap)
From: Markus Hanauska <email@hidden>


Prev by Date:
Re: A change with the -n option for cp in Mac OS X 10.5

Next by Date:
Re: Why no crash reporter if I install some signal handler on	Leopard

Previous by thread:
Re: Resident Memory, Private Memory, Shared Memory (top,	Activity Monitor and vmmap)

Next by thread:
Re: Resident Memory, Private Memory, Shared Memory (top,	Activity 	Monitor and vmmap)

Index(es):

Date
Thread