Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: kmem_alloc replacement in Tiger

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: kmem_alloc replacement in Tiger

Subject: Re: kmem_alloc replacement in Tiger
From: Terry Lambert <email@hidden>
Date: Wed, 1 Feb 2006 14:29:33 -0800

On Feb 1, 2006, at 6:40 AM, Andrew Gallatin wrote:

Terry Lambert writes:

Even if you *are* talking to a device driver, though, you should use IOKit routines to do it. In the case of a G5 or other large physical memory system, physical memory above 4G will either need to be remapped via moving the DART window in the memory controller, or by bouncing, if you didn't use an IOKit routine to allocate the memory in the DART window or below 4G to avoid needing the bounce.

Here is my rational for using kmem_alloc().  My driver is cross
platform (linux, windows, solaris, freebsd, macosx) and all
allocations are done from platform independent code, which calls a
platform dependent allocator.  I'm using kmem_alloc() (via
IOMallocAligned()) inside my macosx allocator.  The allocator has no
idea how the memory will be used.

Practically, all large allocations are subsequently wired for DMA and
mapped into user space via platform dependent code (IOKit's memory
descriptor routines on macosx).  There are some allocations which are
much smaller, which do not need to be wired.

Your comment about bouncing concerns me, especially because we wire
userspace memory for DMA so changes to our driver's memory allocator
would not guarantee all memory we need to deal with comes from the
lower 4GB.  The 32-bit nature of IOKit is frustrating, because our
device is DAC capable and our older products have been for as long as
I can remember.

Please tell me that there is some way to coax a full, 64-bit DMA
address out of IOKit on a DARTless system.  OS-bypass doesn't make any
sense when the DMA involves bounce buffers.  It looks like
getPhysicalSegment64() will only really return a 64-bit address on a
DARTful system (where it is useless for DMA), but perhaps I'm reading
the code wrong.

The specific answer to this question depends on the answer to questions I can't speak on. This is part of where I potentially needed correction, so take the following with a grain of salt, until other people have weighed in on the matter, or you have contacted DTS and gotten an authoritative answer.

The generic answer is as follows: If the platform you are on supports the full address range in the I/O bus, and the I/O bus memory controller is capable (the controller in the "T1" development systems that were leased is *not*, but neither is that system capable of supporting large amounts of physical memory), then there will be no problem.

Therefore, practically, at the IOKit layer, in the absolutely worst case, you could use the 32 bit value as an address token, and handle it in your driver. But as of Tiger, this is in theory taken care of for you in the iovec/uio structures, where we use user_addr_t's, which are 64 bit values. How you handle their processing is up to you, and you can override their remainder processing, and effectively take control of everything yourself, and skip the intervening code path between the request and your way back up the device stack. Doing this would be tricky, so you should be careful.

On DART-ful system, it is my understanding that the issue will be one of performance, as the DART remaps system physical addresses into different location in the I/O bus's idea of the physical address space, and dispersing the memory used for this over the physical memory address space will require additional remapping that may otherwise not be necessary (mapping is done in chunks, and contiguous chunks all in the window size would guarantee that mapping changes would not be necessary). Solaris on SPARC will suffer this same issue, since it has DART-like hardware for I/O memory window management.


- Now for some opinion on design for this specific type of application:

My first comment is that it's not a good idea to grab a chunk of kernel virtual address space for OS-bypass type uses. G5's, for a specific example, have a 32 bit kernel virtual address space even though they support a 64 bit (in 51 bit chunks at a time) process virtual address space. What this means is that a huge amount of physical memory can be addressed by a 64 bit process, and that mapping all of it or a large enough chunk of it to be meaningful would mean swamping the kernel virtual address space map (the pmap layer is 64 bit clean; the kernel map is still 32 bits on the G5).

Likewise, at least with FreeBSD's unified VM and buffer cache, and in NetBSD's case, their UVM implementation, and Windows, it's also likely better to avoid the possibility of fragmentation of the kernel virtual address map, even if it's a full 64 bits. The main issue is the number of pmap entries needed to support that large an address space grows as the address space becomes fragmented, even if the amount of available physical memory is relatively small. I believe this will also be a problem on Linux, unless something happened very recently to change things there; it may or may not be a problem on Solaris, depending on which allocation facilities you use, since it's my understanding that Solaris 9+ has at least some ability to defragment the address map on the fly for the two level page reclaiming allocator (Dynix was capable of the same thing).

So for both these cases, you would be generally better off with nominally portable code like this by wiring down sections of the process virtual memory while it's being used as a DMA target, and communicating those addresses to your driver. This, rather than allocating a huge chunk of the available kernel virtual memory, and making it unavailable to the system. I believe this approach was used in a number of the "zero copy" TCP implementations in FreeBSD, when that code was first being attempted.

If you go for the kernel virtual address space solution, the amount of memory you have available for mapping will be limited to a total of ~2G, given shared segments, com pages, video drivers, etc., and it will be physically discontiguous. The only way to get usefully large chunks of memory would be to do what VT did in support of their HyperTransport cards in their cluster, which was to grab all the memory up front, and manage the physical allocations themselves. So you are still better off mapping in user space, unless, as Jim pointed out, there are specific performance issues that can't be solved going that route.


-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: kmem_alloc replacement in Tiger
From: Andrew Gallatin <email@hidden>
Re: kmem_alloc replacement in Tiger
From: Nikita Danilov <email@hidden>


References:  
  >Re: kmem_alloc replacement in Tiger (From: Andrew Gallatin <email@hidden>)




Prev by Date:
Re: Darwin-kernel Digest, Vol 3, Issue 24

Next by Date:
Re: kmem_alloc replacement in Tiger

Previous by thread:
Re: kmem_alloc replacement in Tiger

Next by thread:
Re: kmem_alloc replacement in Tiger

Index(es):

Date
Thread