Re: TSO / LSO
Re: TSO / LSO
- Subject: Re: TSO / LSO
- From: Terry Lambert <email@hidden>
- Date: Tue, 2 Jan 2007 14:22:50 -0800
On Jan 2, 2007, at 10:39 AM, Andrew Gallatin wrote:
Josh Graessley writes:
No. TCP offload support is something on the to-do list. It's just a
question of resources and priorities. As much of a mess as that will
create, there is potentially a huge upside. The question is, how many
people will benefit from it whether there are more pressing issues.
Full TCP offload (like Micrsoft's Chimney) is a much bigger, and
nastier can of worms than TSO (aka LSO), and could certainly be
described as a "mess to create."
Everything old is new again... I remember "full TCP offload" in
Ungerman-Bass cards in 286's for Xenix, and in DEQNA cards on Microvax
II hardware, back in the late 1980's... 8-).
However, TSO is not full TCP offload, it is just another popular
stateless offload. In TSO, the TCP stack sends a "jumbo" (up to 64KB)
IP packet down to the driver and NIC, and the NIC itself carves that
packet up into many smaller frames. The advantage of TSO is that you
reduce by up to a factor of 40 or so the number of trips from
tcp_output() through the IP, ethernet, and IOKit layers to the driver.
Additionally, the NIC can typically make better use of the PCI bus by
issuing larger DMA reads. It would probably take a full time person a
week or less to port the infastructure from one of the BSDs, and I'm
pretty sure the Intel8245x ethernet chips on most new Apple hardware
support TSO, so you'd see an immediate benefit.
Here is a link to the FreeBSD commits, so as to give an idea of the
relatively small scope of the changes:
http://lists.freebsd.org/pipermail/cvs-src/2006-September/068483.html
http://lists.freebsd.org/pipermail/cvs-src/2006-September/068487.html
http://lists.freebsd.org/pipermail/cvs-src/2006-September/068524.html
(anybody doing a real port would have to look for updates, as of
course bugs were fixed in the next few days).
Actually, the amount of FreeBSD changes for this is a somewhat
disingenuous representation of the amount of actual change to MacOS X
necessary to support TSO.
The centralized tcp_maxmtu() function in FreeBSD, which I believe was
added in FreeBSD to support being able to force down the MTU when
using PPPOE/IGE or other encapsulation for cable modems, etc., without
specific path MTU discovery being enabled (e.g. because some people
misconfigured their networks to prevent all ICMP traffic) is not
present.
Also, it would require changing the struct ifnet to deal with
if_capabilities/if_capenab bits, which would end up changing the size
of the structure, which could be a binary compatibility issue, even if
the new fields were added to the end, if there were any third party
drivers that had statically declared instances (e.g. for
initialization purposes, etc.). It's possible, but not simple, to
mitigate this by abstracting the registration of ifnet devices one
additional level, and adding some KPI for updated drivers for when
that isn't necessary (i.e. partial copy of structures that might be
larger in the case of the old KPI, and a full copy in the new KPI, for
new instances).
So Josh is correct in saying that implementing TSO will likely create
something of a mess
Personally, I'd ball-park it at around 6 weeks, if we include all the
compatibility and interoperabilitity testing (e.g. I know at least one
case of TCP checksum offloading where the hardware used the RFC 1141
incremental checksum code, and blew the checksum calculation for -0,
sending out 0x0000 instead of 0xffff; it was pretty obvious in
Ethereal traces when you enabled hardware checksums on that hardware).
The payoff is pretty large. For the 10GbE NIC I do drivers for, we
went from sending at ~2Gb/s to ~8Gb/s when using a 1500 byte MTU on a
fairly wimpy 1.8GHz single CPU AMD64 when I implemented TSO support
in our FreeBSD driver.
No one is arguing that it isn't a win, particularly for specific
hardware; however, until relatively recently, the vast majority of
GigE chipsets have, for the most part, had very small amounts of
buffer space, compared to that necessary for dealing with a useful
window for a high latency link, or even for the 9K "jumbograms". This
is basically the whole trend of peripheral vendors attempting to
offload resource requirements onto the system to get their parts costs
down (frankly, they do this because it works, and cost sensitivity is
probably the major driver for them).
For TSO, the issue is bigger than the jumbogram issue, since the chip
has to support fragging - a virtual MTU smaller than the MTU it
advertises to the OS, and the OS has to support communicating the path
MTU discovery information back to the card so that it knows where to
frag, and deep transmit (and receive, if it's done right) queues, and
so on.
So it's on a to-do list, and it's a question of resources and
priorities. Just as Josh said.
-
I think if someone outside wanted to do this work in the Darwin
sources, we would look seriously at the changes they made and how they
could be integrated/supported without damaging binary compatibility
with third party drivers. That would be at least one way for an
outside someone to contribute to pushing something over the resource/
priority hump, without having to compete with other things on the to-
do list for the attention of internal developers...
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden