Are memory pages being lost in i386_vm_init()?
Are memory pages being lost in i386_vm_init()?
- Subject: Are memory pages being lost in i386_vm_init()?
- From: "Yount, Chuck" <email@hidden>
- Date: Wed, 26 Mar 2008 10:47:50 -0700
- Thread-topic: Are memory pages being lost in i386_vm_init()?
Title: Are memory pages being lost in i386_vm_init()?
I was debugging some memory allocation anomalies and came across this code that looks inconsistent to me. See what you think...
In xnu-1228.3.13/osfmk/i386/i386_vm_init.c, i386_vm_init() is used to find memory regions and populate pmap_memory_regions[], which is an array of pmap_memory_region_t as defined in pmap.h:
typedef struct pmap_memory_regions {
ppnum_t base;
ppnum_t end;
ppnum_t alloc;
uint32_t type;
} pmap_memory_region_t;
For each memory region, the first page number is stored in "base", and the last page number is stored in "end". The array is built by traversing the list of EFI ranges:
for (i = 0; i < mcount; i++, mptr = (EfiMemoryRange *)(((vm_offset_t)mptr) + msize)) {
ppnum_t base, top;
...
base = (ppnum_t) (mptr->PhysicalStart >> I386_PGSHIFT);
top = (ppnum_t) ((mptr->PhysicalStart) >> I386_PGSHIFT) + mptr->NumberOfPages - 1;
...
Notice that "top", as you would expect, is clearly the index of the last page in a range (because of the "- 1"), not the page after the last page. (This will be important later.)
The usage of "top" and "end" as pointing to the last page is consistent in subsequent usage in this loop, e.g.,
pmptr->alloc = pmptr->base = base;
...
pmptr->end = top;
// etc.
In this way, the pmap_memory_regions array is built from the EFI ranges. (This is a bit simplified, since I didn't describe low-memory salvaging, memory-type consolidation, etc.) Finally, note that the "alloc" member indexes the first unallocated page in each range. For example, if a hypothetical region contains the 5 pages numbered 10...14, alloc == base == 10, and end == 14. So far, so good.
Strangely, though, in the same for loop, the number of available pages is calculated like this:
avail_remaining += (pmptr->end - pmptr->base);
In other words, end is treated as one *past* the last page. So, for the preceding example, end - base == 14 - 10 == 4. So, the avail_remaining counter is not counting one page per region. (This is starting to look like a classic off-by-one bug.)
Next, if you boot with the "maxmem" boot arg, the following code is run:
pages_to_use = avail_remaining;
while (cur_region < pmap_memory_region_count && pages_to_use) {
for (cur_alloc = pmap_memory_regions[cur_region].alloc;
cur_alloc < pmap_memory_regions[cur_region].end && pages_to_use;
cur_alloc++) {
if (cur_alloc > highest_pn)
highest_pn = cur_alloc;
pages_to_use--;
}
if (pages_to_use == 0)
pmap_memory_regions[cur_region].end = cur_alloc;
cur_region++;
}
Notice that, in this code, the "end" member of the struct is used as if it indexes one past the last page. In other words, one would expect the for-loop to read like this ("<= end" instead of "< end"):
for (cur_alloc = pmap_memory_regions[cur_region].alloc;
cur_alloc <= pmap_memory_regions[cur_region].end && pages_to_use;
cur_alloc++)...
If I'm reading this correctly, the result of the existing code is that the last page in each memory range will never be used. From the preceding example, only 4 pages will be used: namely 10...13 because the for-loop body does not execute when cur_alloc == end == 14.
Now, this wouldn't be much of a problem if it only happened when using "maxmem", but the normal VM startup code does the same thing. The following function is called to map the next available page from pmap_memory_regions[]:
boolean_t
pmap_next_page(
ppnum_t *pn)
{
if (avail_remaining)
while (pmap_memory_region_current < pmap_memory_region_count) {
if (pmap_memory_regions[pmap_memory_region_current].alloc ==
pmap_memory_regions[pmap_memory_region_current].end) {
pmap_memory_region_current++;
continue;
}
*pn = pmap_memory_regions[pmap_memory_region_current].alloc++;
avail_remaining--;
return TRUE;
}
return FALSE;
}
In this code, when alloc == end, it skips to the next region. Since alloc indexes the first unallocated page, the last page in each region is never used. Perhaps coincidentally, since the avail_remaining counter was undercounting by one page per region as demonstrated earlier, these two errors cancel out each other so that avail_remaining is decremented to zero when pmap_next_page() is called from vm_resident.c's pmap_startup() until it returns FALSE, so the missing pages are never noticed:
for (i = 0, pages_initialized = 0; i < npages; i++) {
if (!pmap_next_page(&phys_page))
break;
vm_page_init(&vm_pages[i], phys_page);
vm_page_pages++;
pages_initialized++;
}
Anyone agree or disagree w/this analysis? BTW, to be fair, this doesn't actually result in the loss of a significant amount of memory because most systems only have, say, a few dozen ranges, but these type of logic bugs are the ones that always cause trouble down the road. Also, if these pages are being dropped used on purpose to work around an issue with EFI or whatever, this should be better documented in the code, IMHO.
Thanks,
Chuck Y.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden