Re: Resident Memory, Private Memory, Shared Memory (top, Activity Monitor and vmmap)
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com On Nov 30, 2007, at 12:04 PM, darwin-dev-request@lists.apple.com wrote: I have only searched Apple's mailing list archive and I have not found anything; Apple has too many ifdef __APPLE__ in the kernel to say for sure that xnu behaves like any other BSD kernel. This is, techically speaking, not true; it is documented by the (freely available) source code and the activity of your system. Well, source code is no real documentation. The difference is, it took me about 1-2 hours to figure out how top calculates private and shared memory, something a documentation had told me in 2 sentences in 3 minutes. 1. The Resident Memory Size (aka Real Memory) 2. The Private Memory Size 3. The Shared Memory Size Because all memory a process "has" (all pages in a process address space) are either shared or private. So the real memory a process used should be the sum of both. That the private and shared are calculated in such a unfavourable fashion is another thing. "resident memory" is an old OS term, and you will find it defined in many OS textbooks. If you have reached this point, it's more or less assumed that you will have read Tanenbaum or something more recent, and the concept should be self-explanatory. Resident according to top's own explanation is: 'This column reflects the amount of physical memory currently allocated to each process. This is also known as the "resident set size" or RSS. A process can have a large amount of virtual memory allocated (as indicated by the SIZE column) but still be using very little physical memory.' Which already makes a bit less sense, but it's no issue, since COW memory with only one reference is still accounted towards private memory by top (and private memory will only have one reference). and MAP_SHARED is erroneously always accounted as shared even if nobody else has that range of the object mapped. OTOH drawing buffers of Cocoa apps are in fact really shared (between the application and the window server, that's what I saw on some technote) and still saying "they are shared" seems awkward, as it's technically correct, but these buffers only exist because of the application and without it they will also vanish from the window server, so they could as well be accounted towards that application only. Top is not telling you, except in the crudest sense, how much real memory your process uses. Then what is it telling me? If the values are so useless for everything, then why displaying them in the first place? Note that Activity Monitor (which displays the same values as top, but I have no source to back this up, I can only compare them) is no developer tool. It is used by normal, simple minded users, that have no idea what "virtual memory" actually means. And this utility displays real, private and shared memory to the user. What do you expect will a simple minded user think, when seeing these values? Based on what I'd call reasonable experience, I'd say "not very much". Simple minded users are using these values to determine memory consumption of a process. Now if these values are so far away from reality, they are useless and should not even be displayed to simple minded users. Instead, this all should be replaced with a single value, that really can give the user a rough estimation of how much memory this process needs. Understanding the real memory usage of your task is better handled with different tools. Now it gets interesting. That would be which tools? ^^^^ (edited X-Ray to the shipping name "Instruments") instruments to see if you can't get a better understanding of what your process is doing and what that costs. These tools basically tell me how much memory I have "wasted" using malloc. Even in the best case, they might take my stack into account. But what they won't take into account is the memory lost by caching code pages, the memory lost by loaded libraries and their code pages, and so on. And in rarest cases they will even distinguish private and shared memory at all. So if I know my processes uses malloc to allocate about 800 kb of memory during runtime, this could not be any further away from the real number of memory the system loses by running my process as it is. This is an approximation of memory usage, that is far, far away from reality and a much worse approximation than all the values mentioned before. The number I'm looking for is: How much memory does the system need, to keep my process running, assuming it was not allow to swap anything of it (everything needs to be in memory) and here it should take any memory into account that belongs to my process directly or indirectly, which includes malloc, stack, code, static memory, code of libraries and static memory of libraries, coded needed by the dyld - and that clearly distinguishes between shared and unshared memory. It is a very difficult number to compute for a variety of reasons: = Mike _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... You're making an error here in assuming that Darwin is a "BSD kernel" in the first place. The MacOS VM, which is what you're currently being confused by, is a distant descendant of the Mach 2.5 VM, which is where the free BSD's started as well, but it has evolved in very different directions as the operating system itself has had very different motivating factors. Other than in the most general sense, it is not very useful to compare them. It's also nowhere documented. The code is (by definition) a correct and complete description of what the system does. It is the ultimate and most expressive documentation, although it's not so good at describing intent. So far, all these values seems self explaining... but they are not at all for one single reason: How comes 1 + 2 != 3??? What makes you think that they would be? This depends on your definition of "shared" and "private"; you are here defining them as complete and exclusive but that is presumptive and in fact not correct. You asked "why don't resident (1) plus private (2) equal shared (3)", not "why don't private (2) plus shared (3) equal resident (1)". Top is a multi-platform utility with a long history; this definition predates modern VM systems and is necessarily simplistic. However, it's not fundamentally in conflict with reality, and had you read this before making your earlier comments you'd have been less confused. MAP_PRIVATE is handled as COW If you've checked the code I'll take it on face value, but I'm not certain that a COW mapping against an object is actually cross- referenced against other COW mappings (due to the interplay with copy objects and the extreme cost involved) to determine whether the range in question is practically shared or not. In a pedantic interpretation, COW pages are only 'shared' if someone else has a mapping for that page, otherwise they are merely 'shareable'. That is a bit stupid, since it does not reflect reality. As I note above, the alternative would require an exhaustive search of the arbitrarily large set of possibly overlapping mappings. It would help this conversation if you would stop saying "stupid" when you really mean "I do not understand". Whilst this is basically an erroneous digression, it's a good place to hang a point. You're making a fundamental mistake in thinking that any of this accounting is being done for your (the application developers') sake. The VM keeps accounting information for its own purposes, and the statistics that are maintained and tracked bear for the most part directly on how the implementation works, and what it needs to know in order to provide the services it does. As a consequence, the VM is entirely ignorant (and rightly so) of the application or framework level semantics applicable to the interaction between (say) a client and the window server. That interaction uses Mach shared memory semantics (amongst others), and gets the same behaviour that anyone else using those interfaces gets. It matters not that you-the-developer think that the window server "doesn't count"; from the perspective of the VM and the system as a whole, for whom that information is maintained, it does, and thus the numbers reflect that. As I've previously noted, if you want to look at things from an application-centric perspective, you should be using tools that are designed for that purpose. That's a good question, but well outside the scope of this conversation as it strays into marketing territory and the somewhat conservative attitudes that many old-school Unix administrators have towards tools. That number doesn't exist. Computing the working set size of a given collection of threads and their dependencies is more expensive than the system can afford at runtime, when for the most part such a gross number is not useful or interesting as such. I would encourage you to tinker with Shark and Instruments You are trying to force your perception of the way the system uses memory onto it and the tools. You might benefit from stepping back and taking in the way that the tools talk about the system, as they are the product of many years of experience tuning applications, and as such they tend to reflect the things that have previously been worth looking at. Knowing how a given system reacts to what your application is doing at a given time can be entertaining, but it's much more interesting to know how systems in general will react when your application is running, and this abstract view is more generally useful. If I assume by "loses" you mean "uses", then the number varies wildly based on what your application and the system are both doing at the time. What you are asking for is what is generally referred to as the working set size for your application. - There are adaptive algorithms in play with both time- and space- related parameters that will cause your working set to expand or contract based on a number of factors not excluding CPU speed, other thread activity in the system, actual physical memory, disk speed, etc. - There are other tasks in the system (server processes) that need to make timely forward progress in order for your application to do likewise, and their working sets for both your application and other applications' work that may gate yours (and thus other applications' working sets either partially or entire) need to be considered. - The working set for an application can vary wildly based on the configuration of the system it's running on, the user-set preferences, the document or data being worked on, and so forth. - The number varies wildly with time; some applications' working set size relates fairly directly to what the application is doing at the time, or has periodic aspects that relate to a long-running task, but for some applications it is grossly impacted by external or non-direct factors that don't correlate to the application's activity at all. Because this number is very difficult to derive, and because deriving it at most tells you about limit conditions, it is generally better as a developer to focus your attention on the factors that affect the number and which are a consequence of your application's behaviour instead. These are generally easier to identify, and since they're something that you can do something about, a better place to start anyway. This email sent to site_archiver@lists.apple.com
participants (1)
-
Michael Smith