Re: TSO / LSO
Re: TSO / LSO
- Subject: Re: TSO / LSO
- From: Andrew Gallatin <email@hidden>
- Date: Thu, 4 Jan 2007 08:01:15 -0500 (EST)
Terry Lambert writes:
> On Jan 3, 2007, at 10:28 AM, Adi Masputra wrote:
> > On Jan 3, 2007, at 6:44 AM, Andrew Gallatin wrote:
> >
> >> There are 3 problems with the above:
> >>
> >> 1) The NIC does not advertise any sort of MTU. The stack just sends
> >> down the biggest frame it can (ip_len is 16 bits, so the max is 64K-1
> >> for IPv4..). The NIC is responsible for splitting this into as many
> >> packets as required.
>
>
> Adi covered most of this below, but the real concern I had here was
> the NIC advertising the largest buffer that it was willing to take at
> a time. To me, this is the effective MTU to the card; you said 64K,
> but from my reading on TSO, you might wish to limit this due to on
> card memory buffer constraints vs. the number of active streams you
> expect to deal with simultaneously (i.e. host quench based on
> knowledge of card transmit buffer size).
The limitation is not needed, but there is a precedent for it
in the NDIS stack. When we did our windows driver, we
were a bit puzzled by the purpose for it :)
> Minimally, there would be a need for a global override for protecting
> legacy filters from TSO (i.e. a way to turn it off, if filters were
> present).
Again, I don't dispute any problems with filters. I know nothing
about the filter architecture, and defer to you guys.
> >> 3) There is no need to buffer up an entire window on the NIC. All
> >> you
> >> really need to buffer on the NIC is one frame's worth of data (so
> >> that
> >> you can properly send the frame on the wire).
>
> Well, you also need the header template in hand for each in progress
> connection for which the data hasn't already been sent (minimally),
> etc..
Sure. This adds a max of what, 128 bytes? I was afraid you pick
this out, and should have mentioned it..
> This assumes that either the transmit buffer data is contiguous (a bad
> assumption, unless we are packing packet headers at the end of a page,
> and only mapping page aligned data following the headers), or that the
> NIC supports scatter/gather DMA.
How many nics made in the last decade have not supported s/g? :)
> If you consider that 100% bus-on for a NIC on a 64 bit wide PCI bus,
> you are talking about monopolizing the bus with data to keep a card
> properly fed. Breaking this up with scatter/gather would make things
> worse by increasing the transaction overhead, rather than amortizing
> it over a large buffer.
If you're going to saturate the link for 10GbE, you're going to push
the IO bus hard. Even on a MacPro, we only see in the neighborhood
of 11.25Gb/s PCIe DMA read bandwidth for our 8 lane NIC.
If you think S/G is bad, consider 64K (well 654240 to make the numbers
round) sent on a 1460MSS link. Without TSO, you end up typically
doing 44 54 byte DMAs, and 44 1460 byte DMAs. With TSO, you end up
doing a single 54 byte DMA, and 15 4KB DMAs (assuming 4KB mbufs), plus
a 2800 byte DMA at the end.. So TSO has reduced the number of DMAs to
17 vs 88. That's ammortizing..
> Practically, I think the card has to have on-board buffers, not a
> single frame buffer, for this to be a happy thing; absolutely worst
Sure. For example, we have 32KB. This handles a 64KB offload nicely.
An MTU sized buffer is just an example of what you could do if you
were cheap. In order to keep a pipeline fed, you need to have several
times the MTU. You don't need to buffer the entire "fake" jumbogram
before you carve it and send it, and you certainly don't need
an entire TCP window. This is a stateless offload.. the NIC
does not even know what the TCP window is! It has no reason to
care about it.
> case, you have a .1% lossy link, and you are going back and redoing
> the DMA to do a retransmit otherwise, and having to step through and
> recreate the packets to the transmit packet (consider additional
> fragging on intermediate routes and RED queueing or QOS-based drops in
> intermediate routers may adjust the frame base on byte boundaries,
> rather than nice clean packet boundaries - ACK's in the middle of what
> you thought as a packet, etc.).
>
> I expect that for TSO, you'd just eat the retransmit overhead, but you
> might want to have the stack know that it was happening, and cut down
> on the replicated data that was getting sent (and the amount of bus
> traffic, as a result).
Again, the nic has no knowledge of any drops or packet loss. As we
discussed before, this is a *stateless* offload. All TCP state is
mainted by the host TCP/IP stack. If there are any drops, the host
will notice it from the ack, sacks, or lack thereof.
> In my experience, there's usually a much larger distance between
> initial implementation and what you can commercially deploy than you
> might think.
Our commercially deployed 10GbE NIC with TSO (and LRO) has lots of
satisfied customers using OSes like Linux and Windows which support
TSO.
BTW, I need to do LRO for MacOSX. This nice thing about LRO is that
you can do it in the NIC and driver on nearly any OS that doesn't have
stupid restrictions on buffer chaining (like NDIS's restrictions).
Unfortunately, the MacOSX receive performance is already OK, its the
send performance which stinks.
Drew
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden