Thanks Josh.
So on my MacBook Pro with the Marvell Yukon NIC, I noticed that its
ifnet_offload capabilities report IFNET_CSUM_TCP, IFNET_CSUM_UDP, and
IFNET_VLAN_TAGGING, but not IFNET_CSUM_IP.
And correspondingly, when examining the mbuf in my iff_output_func, I
never notice MBUF_CSUM_REQ_IP set.
So, your recommendation boils down to the following on my MBP:
- Put ip_len field in host byte order.
- Call mbuf_outbound_finalize
- Put ip_len back in network byte order.
Which is what Adi had recommended I do in the code snippet I
linked to
[ http://lists.apple.com/archives/darwin-kernel/2006/Dec/
msg00072.html ]
Yet, I was seeing those kernel panics at boot time, in the 3rd step
(Put ip_len back), corresponding to those mDNS UDP offloaded packets.
I don't know why mbuf_outbound_finalize *only* on these packets was
leaving the mbuf trashed, but I have a feeling it has to do with the
mbuf pull-up code in in_delayed_cksum_offset, and the fact that these
mDNS mbufs are chained mbufs, with the pkthdr mbuf containing the L2
header, and the rest of the IP+UDP+payload packet in the next mbuf in
the chain.
So I suspect mbuf_outbound_finalize is probably broken for all
chained
mbufs where the pkthdr mbuf doesn't contain the IP header, but I
can't
point you to the line of code in the xnu/bsd source that is the
culprit, sorry!
Thanks
- Bhavesh
On 1/2/07, Josh Graessley <email@hidden> wrote:
Sorry, I saw that stuff earlier. I wanted to give a more thorough
explanation of what those flags mean and what the csum_data value
means.
Here's what I would try. I think this will work, but I may have
glossed over something.
If I remember correctly, the mbuf_outbound_finalize code is entirely
broken because it calls a function to calculate the ip header where
it is expecting the ip header to be in host byte order and it
calls a
function to calculate the TCP/UDP header where it expected the
ip_len
field to be in network byte order. The TCP/UDP is the trickier part
to generate.
- Check if MBUF_CSUM_REQ_IP is set
- Put ip_len field in host byte order.
- Call mbuf_outbound_finalize
- Put ip_len back in network byte order.
- If MBUF_CSUM_REQ_IP was set, re-calculate the ip header checksum.
I think that works around the problem. In addition, it will be safe
when the bug is fixed. The fix will handle ip_len in either byte
order and do the right thing. With this solution, after the fix, you
will simply be calculating the ip header checksum twice.
-josh
On Jan 2, 2007, at 9:36 AM, Bhavesh Davda wrote:
> Thanks for the explanation, Josh!
>
> I had figured as much reading the xnu-792.13.8 code.
>
> However, if you look at the subject of this e-mail thread, and
go back
> to my original post about this, I had reported a bug in
> mbuf_outbound_finalize and friends, that Adi acknowledged as a
known
> bug that will be fixed in a subsequent release of the kernel.
>
> Instead of reposting those e-mails, I'll just link to the
appropriate
> posts here:
>
> My original post:
> http://lists.apple.com/archives/darwin-kernel/2006/Dec/
msg00063.html
>
> Adi's response with a code snippet for a workaround:
> http://lists.apple.com/archives/darwin-kernel/2006/Dec/
msg00072.html
>
> Doing what Adi suggested worked for *most* cases, except that I
> started seeing kernel panics when my kext loads at boot time,
when Mac
> OS (probably Bonjour) sends an mDNS packet, which is UDP checksum
> offloaded, and my kext calls mbuf_outbound_finalize(). The panic
> happens in the call to mbuf_copyback when it tries to restore the
> tot_len field in the IP header in the mbuf in network-byte-order.
>
> When I looked at the mbuf after gdb attaching to the panicked
kernel,
> all the pointers looked trashed. Instead of further debugging
> mbuf_outbound_finalize/in_delayed_cksum_offset/in_cksum_skip, I
> decided that I had spent way too much time on this whole hw
checksum
> offloading mess, and decided that I would walk the chain of mbufs
> myself, computing my own checksums.
>
> Yes, I do so only when my kext has determined that an outbound
mbuf is
> *interesting*, and I clone the original mbuf for forwarding to my
> applications. So it isn't wasted effort.
>
> However, I would love for you to tell me that I don't need to
do so :)
> My poor-mans attempt at checksumming a block of data is
probably not
> going to perform as well as the tuned in_cksum_skip and assembly
> helpers :)
>
> Thanks
>
> - Bhavesh
>
>
>
> On 1/1/07, Josh Graessley <email@hidden> wrote:
>>
>> Oh no, the crummy hardware checksum stuff has stuck to me! Help!
>> Someone save me!
>>
>> Terry has most of it.
>>
>> The value of csum_data is a bit more complicated.
>>
>> Hardware checksums occur in two directions, inbound and
outbound. The
>> csum_data field has very different meanings depending on the
>> direction.
>>
>> Inbound:
>> The csum_flags will have bits set indicating which checksums were
>> performed in hardware. In the event that the hardware looked
at the
>> checksum for the ip header in hardware, the MBUF_CSUM_DID_IP flag
>> will be set. This flag indicates that the MBUF_CSUM_IP_GOOD
flag is
>> valid. When we get to ip_input, if MBUF_CSUM_DID_IP is set we
will
>> skip doing the checksum ourself. Instead, we will check
>> MBUF_CSUM_IP_GOOD. If MBUF_CSUM_IP_GOOD is not set, we discard
the
>> packet as having a bad ip header checksum.
>>
>> In addition, the MBUF_CSUM_DID_DATA flag may be set. This flag
>> indicates that the TCP or UDP checksum was calculated in
hardware.
>> The calculated value will appear in csum_data. When the packet
gets
>> in to tcp_input or udp_input, we will check this flag. If it
is set,
>> we will use the value in csum_data instead of calling in_cksum.
>>
>> Outbound:
>> The csum_flags field will have bits set indicating which
checksums
>> should be performed by the hardware. MBUF_CSUM_REQ_IP
indicates that
>> the ip header checksum has not been calculated yet and should be
>> calculated before the packet is sent. MBUF_CSUM_REQ_TCP and
>> MBUF_CSUM_REQ_UDP indicate that the hardware should calculate
the TCP
>> or UDP checksum before the packet is sent. When either of
these flags
>> are set, csum_data will contain a pre-computed psuedo header
checksum
>> and the length to the offset of the location where the TCP or UDP
>> checksum should be placed. There is also some goo in there for
some
>> CSUM_TCP_SUM16 thing. For that case, the csum_data value will
mean
>> something completely different. I can't recall the details but
there
>> is a lot of extra work done for a very special chip that had some
>> very special requirements. When I say special, I don't mean
that in a
>> good way. 'nuff said.
>>
>> What's the best way to deal with hardware checksums in a kext?
Get
>> rid of them as fast as you possibly can. How do you get rid of
them?
>> You calculate them in software the way the maker intended. The
kpi
>> provides two primary functions for doing this:
>>
>> mbuf_outbound_finalize
>> and
>> mbuf_inbound_modified
>>
>> Whenever you receive a packet on the outbound path (stack to
>> interface), call mbuf_outbound_finalize. Whenever you receive a
>> packet on the inbound path, call mbuf_inbound_modified. From
there,
>> you can go about your business as though the entire nightmare of
>> hardware checksums does not exist. For a little extra credit,
only
>> call mbuf_inbound_modified and mbuf_outbound_finalize on
packets that
>> you intend to modify, forward, encapsulate or in some other
way do
>> anything beyond inspecting.
>>
>> Do not try to modify the value of csum_data. The stack is
responsible
>> for setting that and the drivers are responsible for consuming
it.
>>
>> Why am I so opposed to hardware checksums? Two reasons. First,
>> checksums detect errors. If you introduce an error somewhere
in the
>> software just below the stack (say the driver) or even in the
>> hardware itself (dma engine), the checksum will catch the error
>> quickly, making it easier to narrow down the root cause. Second,
>> layer violations like this lead to problems for network kernel
>> extensions and results in little bits of code to handle this
peppered
>> throughout the stack. Offloading more of the TCP stack on to
hardware
>> leads to similar problems. Mmmm canned worms.
>>
>> Best of luck,
>> -josh
>>
>> On Jan 1, 2007, at 5:28 PM, Terry Lambert wrote:
>>
>> > Josh G. is the expert here, but I'm pretty sure that you
would end
>> > up shooting your foot off if you did this.
>> >
>> > When you set these flags, they are generally done in the
protocol
>> > specific output functions, and since TCP and UDP are both
layered
>> > on top of IP, you are requesting that the checksum be
calculated
>> > both as if the packet were a TCP and a UDP packet, and then
asking
>> > for the IP encapsulation checksum to also be calculated.
>> >
>> > If you had hardware checksum assist (common on most hardware
these
>> > days), then you may end up with both the TCP and UDP checksums
>> > calculated, and one of them stomping the other.
>> >
>> >
>> > FWIW, csum_data is a value, not a pointer, for the 16 bit
checksum
>> > that's trying to be calculated, and is used as an
accumulator for
>> > the checksum calculation. Practically, this means that for
your
>> > applications, you'd likely want to leave it alone, unless
you were
>> > setting flags requesting checksums. If you were doing
that, then
>> > you'd need to know if it was on input or output, and since
you are
>> > doing a full checksum, then you'd want to set it to 0xffff
or 0,
>> > respectively (see the source code for the protocol specific
input/
>> > output mechansims for details.
>> >
>> >
>> > Again, if you are doing an incremental update (i.e. you already
>> > have a valid packet in hand, and are trying to replace a few
data
>> > fields in it without hanging packet size), then the only
things you
>> > will be touching are the data, the layer 4 checksump (TCP or
UDP)
>> > and the layer 3 checksum (IP). You won't be setting these
flags to
>> > request anything, clearing them to avoid requesting
anything, and
>> > you won't be using the accumulator yourself.
>> >
>> >
>> > I think at this point, it would be educational to download
the xnu
>> > sources and do a "grep -r" for the field names and flags to
better
>> > understand how the code works.
>> >
>> > -- Terry
>> >
>> > On Dec 29, 2006, at 9:31 AM, Bhavesh Davda wrote:
>> >> Hi Terry,
>> >>
>> >> That's exactly what I ended up implementing. I haven't read
RFC
>> 1624,
>> >> but I will.
>> >>
>> >> Related question: if ever csum_flags has both CSUM_IP and
>> CSUM_TCP |
>> >> CSUM_UDP set, then what is csum_data going to point to?
>> >>
>> >> Thanks!
>> >>
>> >> - Bhavesh
>> >>
>> >> On 12/28/06, Terry Lambert <email@hidden> wrote:
>> >>> Are you changing the length of the packet, or only the
contents?
>> >>>
>> >>> If you are only changing the contents, then I suggest an
mbuf-
>> chain
>> >>> aware incremental checksum update as an alternative to
copying it
>> >>> to a
>> >>> contiguous buffer, operating on it, converting it to
mbufs, then
>> >>> doing
>> >>> a full software checksum of the whole packet.
>> >>>
>> >>> See RFC 1624.
>> >>>
>> >>> -- Terry
>> >>>
>> >>> On Dec 22, 2006, at 10:12 AM, Adi Masputra wrote:
>> >>> > On Dec 22, 2006, at 9:58 AM, Bhavesh Davda wrote:
>> >>> >>
>> >>> >> I compare the result of mbuf_pkthdr_len agains the
network-
>> >>> byte-order
>> >>> >> value of IP length on purpose, to know whether indeed the
>> >>> value of IP
>> >>> >> length in the packet is *not* in host-byte-order, as I
thought
>> >>> you
>> >>> >> said mbuf_outbound_finalize() expects.
>> >>> >
>> >>> > Since your module is an interface filter, the field(s)
in the
>> >>> > header(s)
>> >>> > will be in network byte-order. All you need to do is to
>> byte-swap
>> >>> > the length at all times assuming it is an IPv4 packet.
>> >>> >
>> >>> >>
>> >>> >> Oh boy. This is going to get quite expensive to do on a
per
>> >>> packet
>> >>> >> basis. I thought I could simply get a pointer into the
packet
>> >>> payload
>> >>> >> to the IP length field in the IP header, and do a
memory read/
>> >>> write,
>> >>> >> rather than going through so many function calls and
memcpy's,
>> >>> albeit
>> >>> >> small (2 byte) ones.
>> >>> >
>> >>> > Well, you are already requesting the stack to perform
software
>> >>> > checksum
>> >>> > calculation on the packet which includes the transport
>> payload; in
>> >>> > terms
>> >>> > of per-byte cost that is much more significant than
>> traversing the
>> >>> > mbuf
>> >>> > chain to safely obtain the 2-octets IP length in the
>> header. :)
>> >>> >
>> >>> >>
>> >>> >> Is there any other way to do it more optimally? I
>> understand your
>> >>> >> concern about non-linear mbufs, and not assuming that
the IP
>> >>> header
>> >>> >> is
>> >>> >> in the pkthdr mbuf. But realistically, is that ever
going to
>> >>> be the
>> >>> >> case?
>> >>> >>
>> >>> >
>> >>> > There's no guarantee that the headers will be in contiguous
>> span
>> >>> > although for the majority of cases they would. Sorry,
but as I
>> >>> said
>> >>> > the fix is already in order.
>> >>> >
>> >>> > Adi
>> >>> >
>> >>> >> Thanks!
>> >>> >>
>> >>> >> --
>> >>> >> Bhavesh P. Davda
>> >>> >
>> >>> > _______________________________________________
>> >>> > Do not post admin requests to the list. They will be
ignored.
>> >>> > Darwin-kernel mailing list (Darwin-
email@hidden)
>> >>> > Help/Unsubscribe/Update your Subscription:
tlambert%
>> >>> 40apple.com
>> >>> >
>> >>> > This email sent to email@hidden
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Bhavesh P. Davda
>> >
>> > _______________________________________________
>> > Do not post admin requests to the list. They will be ignored.
>> > Darwin-kernel mailing list (email@hidden)
>> > Help/Unsubscribe/Update your Subscription:
jgraessley%
>> > 40apple.com
>> >
>> > This email sent to email@hidden
>>
>>
>>
>>
>
>
> --
> Bhavesh P. Davda
--
Bhavesh P. Davda
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40apple.com
This email sent to email@hidden