Re: kmem_alloc replacement in Tiger
Re: kmem_alloc replacement in Tiger
- Subject: Re: kmem_alloc replacement in Tiger
- From: Andrew Gallatin <email@hidden>
- Date: Wed, 1 Feb 2006 18:58:46 -0500 (EST)
Terry Lambert writes:
> > Please tell me that there is some way to coax a full, 64-bit DMA
> > address out of IOKit on a DARTless system. OS-bypass doesn't make any
> > sense when the DMA involves bounce buffers. It looks like
> > getPhysicalSegment64() will only really return a 64-bit address on a
> > DARTful system (where it is useless for DMA), but perhaps I'm reading
> > the code wrong.
>
> The specific answer to this question depends on the answer to
> questions I can't speak on. This is part of where I potentially
> needed correction, so take the following with a grain of salt, until
> other people have weighed in on the matter, or you have contacted DTS
> and gotten an authoritative answer.
>
>
> The generic answer is as follows: If the platform you are on supports
> the full address range in the I/O bus, and the I/O bus memory
> controller is capable (the controller in the "T1" development systems
> that were leased is *not*, but neither is that system capable of
> supporting large amounts of physical memory), then there will be no
> problem.
>
> Therefore, practically, at the IOKit layer, in the absolutely worst
> case, you could use the 32 bit value as an address token, and handle
> it in your driver. But as of Tiger, this is in theory taken care of
> for you in the iovec/uio structures, where we use user_addr_t's, which
> are 64 bit values. How you handle their processing is up to you, and
> you can override their remainder processing, and effectively take
> control of everything yourself, and skip the intervening code path
> between the request and your way back up the device stack. Doing this
> would be tricky, so you should be careful.
You've confused me. What 32-bit value? I wasn't asking about how
to handle a 64-bit user virtual address. We already do this.
My concern was with avoiding bouncing a 64-bit physical address which
backs a 32-bit kernel virtual address (or a 32- or 64-bit user virtual
address). Right now, I'm thinking about a "32-bit" intel system with
PAE and >4GB of RAM.
> On DART-ful system, it is my understanding that the issue will be one
> of performance, as the DART remaps system physical addresses into
> different location in the I/O bus's idea of the physical address
> space, and dispersing the memory used for this over the physical
> memory address space will require additional remapping that may
> otherwise not be necessary (mapping is done in chunks, and contiguous
> chunks all in the window size would guarantee that mapping changes
> would not be necessary). Solaris on SPARC will suffer this same
> issue, since it has DART-like hardware for I/O memory window management.
Yes, the DART is a pain. Tiger is much better, and can do
our allocations about 25% faster than Panther.
>
> - Now for some opinion on design for this specific type of application:
>
> My first comment is that it's not a good idea to grab a chunk of
> kernel virtual address space for OS-bypass type uses. G5's, for a
<....>
This is for a small (a few MB/procss) area to use to copy
small messages to/from.
<....>
> So for both these cases, you would be generally better off with
> nominally portable code like this by wiring down sections of the
> process virtual memory while it's being used as a DMA target, and
> communicating those addresses to your driver. This, rather than
> allocating a huge chunk of the available kernel virtual memory, and
> making it unavailable to the system. I believe this approach was used
> in a number of the "zero copy" TCP implementations in FreeBSD, when
> that code was first being attempted.
This is exactly what we do, on all platforms. Large messages are
pinned on demand. Small messages are copied to the copyblock
which is mapped into the user process from the kernel. This is
the buffer that I was talking about earlier.
It is amusing that the ideal cutoff point between small/large messages
is directly dependant on the operating system's system call latency,
and memory pinning latency. MacOSX has by far the worst system call
overhead of any platform we've seen. Sometimes by as much as a factor
of 8 (when compared to ppc64 linux). At least Tiger redeems itself by
being fairly adept with the DART, and winds up making up all the time
it lost doing the system call by actually doing the DART setup
quicker. If you take the DART out of the picture, linux blows macosx
out of the water.
Drew
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden