Re: kmem_alloc replacement in Tiger
Re: kmem_alloc replacement in Tiger
- Subject: Re: kmem_alloc replacement in Tiger
- From: Terry Lambert <email@hidden>
- Date: Wed, 1 Feb 2006 14:29:33 -0800
On Feb 1, 2006, at 6:40 AM, Andrew Gallatin wrote:
Terry Lambert writes:
Even if you *are* talking to a device driver, though, you should use
IOKit routines to do it. In the case of a G5 or other large physical
memory system, physical memory above 4G will either need to be
remapped via moving the DART window in the memory controller, or by
bouncing, if you didn't use an IOKit routine to allocate the memory
in
the DART window or below 4G to avoid needing the bounce.
Here is my rational for using kmem_alloc(). My driver is cross
platform (linux, windows, solaris, freebsd, macosx) and all
allocations are done from platform independent code, which calls a
platform dependent allocator. I'm using kmem_alloc() (via
IOMallocAligned()) inside my macosx allocator. The allocator has no
idea how the memory will be used.
Practically, all large allocations are subsequently wired for DMA and
mapped into user space via platform dependent code (IOKit's memory
descriptor routines on macosx). There are some allocations which are
much smaller, which do not need to be wired.
Your comment about bouncing concerns me, especially because we wire
userspace memory for DMA so changes to our driver's memory allocator
would not guarantee all memory we need to deal with comes from the
lower 4GB. The 32-bit nature of IOKit is frustrating, because our
device is DAC capable and our older products have been for as long as
I can remember.
Please tell me that there is some way to coax a full, 64-bit DMA
address out of IOKit on a DARTless system. OS-bypass doesn't make any
sense when the DMA involves bounce buffers. It looks like
getPhysicalSegment64() will only really return a 64-bit address on a
DARTful system (where it is useless for DMA), but perhaps I'm reading
the code wrong.
The specific answer to this question depends on the answer to
questions I can't speak on. This is part of where I potentially
needed correction, so take the following with a grain of salt, until
other people have weighed in on the matter, or you have contacted DTS
and gotten an authoritative answer.
The generic answer is as follows: If the platform you are on supports
the full address range in the I/O bus, and the I/O bus memory
controller is capable (the controller in the "T1" development systems
that were leased is *not*, but neither is that system capable of
supporting large amounts of physical memory), then there will be no
problem.
Therefore, practically, at the IOKit layer, in the absolutely worst
case, you could use the 32 bit value as an address token, and handle
it in your driver. But as of Tiger, this is in theory taken care of
for you in the iovec/uio structures, where we use user_addr_t's, which
are 64 bit values. How you handle their processing is up to you, and
you can override their remainder processing, and effectively take
control of everything yourself, and skip the intervening code path
between the request and your way back up the device stack. Doing this
would be tricky, so you should be careful.
On DART-ful system, it is my understanding that the issue will be one
of performance, as the DART remaps system physical addresses into
different location in the I/O bus's idea of the physical address
space, and dispersing the memory used for this over the physical
memory address space will require additional remapping that may
otherwise not be necessary (mapping is done in chunks, and contiguous
chunks all in the window size would guarantee that mapping changes
would not be necessary). Solaris on SPARC will suffer this same
issue, since it has DART-like hardware for I/O memory window management.
- Now for some opinion on design for this specific type of application:
My first comment is that it's not a good idea to grab a chunk of
kernel virtual address space for OS-bypass type uses. G5's, for a
specific example, have a 32 bit kernel virtual address space even
though they support a 64 bit (in 51 bit chunks at a time) process
virtual address space. What this means is that a huge amount of
physical memory can be addressed by a 64 bit process, and that mapping
all of it or a large enough chunk of it to be meaningful would mean
swamping the kernel virtual address space map (the pmap layer is 64
bit clean; the kernel map is still 32 bits on the G5).
Likewise, at least with FreeBSD's unified VM and buffer cache, and in
NetBSD's case, their UVM implementation, and Windows, it's also likely
better to avoid the possibility of fragmentation of the kernel virtual
address map, even if it's a full 64 bits. The main issue is the
number of pmap entries needed to support that large an address space
grows as the address space becomes fragmented, even if the amount of
available physical memory is relatively small. I believe this will
also be a problem on Linux, unless something happened very recently to
change things there; it may or may not be a problem on Solaris,
depending on which allocation facilities you use, since it's my
understanding that Solaris 9+ has at least some ability to defragment
the address map on the fly for the two level page reclaiming allocator
(Dynix was capable of the same thing).
So for both these cases, you would be generally better off with
nominally portable code like this by wiring down sections of the
process virtual memory while it's being used as a DMA target, and
communicating those addresses to your driver. This, rather than
allocating a huge chunk of the available kernel virtual memory, and
making it unavailable to the system. I believe this approach was used
in a number of the "zero copy" TCP implementations in FreeBSD, when
that code was first being attempted.
If you go for the kernel virtual address space solution, the amount of
memory you have available for mapping will be limited to a total of
~2G, given shared segments, com pages, video drivers, etc., and it
will be physically discontiguous. The only way to get usefully large
chunks of memory would be to do what VT did in support of their
HyperTransport cards in their cluster, which was to grab all the
memory up front, and manage the physical allocations themselves. So
you are still better off mapping in user space, unless, as Jim pointed
out, there are specific performance issues that can't be solved going
that route.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden