Tiger NKE Development Notes
Tiger NKE Development Notes
- Subject: Tiger NKE Development Notes
- From: "Peter Sichel" <email@hidden>
- Date: Mon, 9 May 2005 22:07:09 -0400
Introduction
------------
Mac OS X 10.4 Tiger introduces some new Kernel Programming Interfaces.
In addition to hiding internal data structures, Apple made a significant
change in the way packets can be injected into the network stack.
Instead of re-inserting packets from the current NKE position in the
network stack, packets are re-inserted from the beginning of the NKE
chain or slightly before to allow re-routing or demultiplexing. As it
turns out, this has subtle implications for the design of practically
every Network Kernel Extension. NKEs must also define their own locking
model. I have prepared these notes for two reasons:
(1) To maximize NKE compatibility by helping other developers not repeat
the same mistakes; [In other words, I don't want your code to break my
NKE or vice versa.]
(2) To clarify what Apple did and possible future directions;
I am the developer of three Mac OS X products that rely on Network
Kernel Extensions (NKEs) to do there work including IPNetRouterX which
makes heavy use of network plumbing at the interface filter level.
mbuf Tags and Seeing the Same Packet Twice
------------------------------------------
In Tiger, every NKE must be prepared to see the same packet more than
once. The reason is that any other NKE that swallows and later re-
injects a packet will cause the entire NKE chain to re-execute. The
recommended solution for handling this situation is to use an mbuf tag
to mark packets that have been previously processed so they can be
handled properly the second time around. I have written some basic
routines which demonstrate this below.
// tag values
#define TAG_IN 1
#define TAG_OUT 2
// ----------------------------------
-----------------------------------------------
// ? PROJECT_mtag
// ----------------------------------
-----------------------------------------------
// Tag this packet with "tag_value" so we'll know if we have seen it.
// tag_value will normally be either TAG_IN or TAG_OUT.
//
// Notice if an mbuf is re-used to send a response such as an echo
request (ping) that
// is converted to an echo reply, or simply IP Forwarded back out
another interface,
// you may want to recognize the redirected mbuf as a new packet.
// Thus we distinguish TAG_IN from TAG_OUT and intentionally replace the
// previous tag value rather than combining. If we re-direct a packet,
we need to
// re-tag it accordingly. The code below checks for an existing tag and
resets the value.
//
// Finally, an mbuf tag is effectively another mbuf that travels with
the packet.
// Allocating an mbuf tag can fail in which case we return the
corresponding error code.
// Notice it is the callers responsibility to free the original packet
to avoid looping
// or a recursive lock attempt (kernel panic) if a packet without the correct
// tag is re-injected and processed again.
//
errno_t PROJECT_mtag(mbuf_t mbuf_ref, int tag_value)
{
errno_t status;
int *tag_ref;
size_t len;
// look for existing tag
status = mbuf_tag_find(mbuf_ref, gidtag, MY_TAG_TYPE, &len, (void*)&tag_ref);
// allocate tag if needed
if (status != 0) {
status = mbuf_tag_allocate(mbuf_ref, gidtag, MY_TAG_TYPE, sizeof
(*tag_ref), MBUF_DONTWAIT, (void**)&tag_ref);
if (status == 0) *tag_ref = 0;
}
//if (status == 0) *tag_ref = tag_value;
if (status == 0) {
if (tag_value < 3) *tag_ref &= ~3; // clear previous TAG_IN or TAG_OUT
*tag_ref |= tag_value;
}
return status;
}
// ----------------------------------
-----------------------------------------------
// ? PROJECT_is_mtag
// ----------------------------------
-----------------------------------------------
// Return 1 if packet is tagged with "tag_value", otherwise 0
int PROJECT_is_mtag(mbuf_t mbuf_ref, int tag_value)
{
int returnValue = 0;
errno_t status;
int *tag_ref;
size_t len;
// Check whether we have seen this packet before.
status = mbuf_tag_find(mbuf_ref, gidtag, MY_TAG_TYPE, &len, (void**)
&tag_ref);
if ((status == 0) && (*tag_ref & tag_value)) returnValue = 1;
return returnValue;
}
With these support functions in place, I then use the following code at
the start of my NKEs packet input function:
// avoid duplicates
if ( PROJECT_is_mtag(*data, TAG_IN) ) return 0;
returnValue = PROJECT_mtag(*data, TAG_IN);
if (returnValue != 0) return returnValue; // if allocation failed, get out
While these snippets demonstrate the basic concept, there are some
subtle details I want to emphasize. If you copy an mbuf (mbuf_copy) or
call another function that creates a copy on your behalf (mbuf_pullup),
your tag will not automatically be copied in Tiger 10.4 . Thus any time
you pullup or copy a packet you need to retag it. Apple DTS warns that
Apple will probably fix this in a future version, so when tagging or
retagging a packet you should check first to see if your tag is already
present. There is nothing that prevents you from tagging the same
packet more than once, and if you do the results could be ambiguous.
Somewhat more troubling is that if some one else pulls up or copies a
packet, your tag will be lost. In practice this should be rare since
most packets use an external buffer or only get pulled up once to make
the headers contiguous. [My code copies packets for Ethernet bridging
them to other data links and pulls up packets to examine the packet
headers for NAT and IP filtering.]
If your NKE or any that come after you re-inject a packet, your NKE will
see it again. It is your responsibility to ensure that you don't loop
or try to re-acquire a mutex lock. [My code delays and re-injects TCP
Reset segments, and withholds and later re-injects Acks.]
Another issue is mbuf re-use. If your NKE is inserted in more than one
data link, it is quite possible for packets arriving on one interface to
be forwarded out another interface without modifying any previous tags.
Some network code in the stack also re-uses the received mbuf to send a
response. Thus you may need to distinguish which data link previously
saw a packet or which direction a packet arrived from by using different
tag values in order to recognize unprocessed packets previously tagged
by your NKE. The sample code above distinguishes inbound versus
outbound packets since any mbuf that is turned around is a new packet
with respect to the NKE injection mechanism. [My code supports one arm
gateways that forward packets back out the same interface they arrived on.]
If your NKE redirects an inbound packet by injecting it to a different
data link from the one it arrived on, you may need to turn off the
MBUF_PROMISC flag since promiscuous packets are normally deleted before
being sent to the IP layer. If you change the direction of a packet,
your code should retag it for the direction your NKE is sending it. [My
code enables promiscuous mode and redirects packets from one data link
to another to perform Ethernet bridging.]
Locking Model
-------------
Tiger introduces fine grain locking for data and thread synchronization
such that you only need to protect your own NKE data structures. This
is a huge improvement over having to understand the kernel's locking
model or using a single "funnel" for network versus file I/O. The
advantage of using finer grain locking is that more threads can execute
simultaneously, but there's also a trade-off in the overhead of
maintaining and acquiring additional locks. For a basic interface filter
NKE, it often makes sense to start with a single lock for all your
critical data structures, and then optimize by adding finer grain locks
only if it improves performance. As an NKE designer you probably have
some idea about whether there are distinct processing phases in your NKE
that could benefit from parallel execution.
With the power to define your own locking model comes certain
responsibilities. You need to ensure that a single thread never tries
to acquire the same mutex lock more than once as this will result in a
kernel panic (recursive lock attempt). If you have more than one lock,
you need to ensure that threads never deadlock by waiting for a lock
held by another suspended thread.
In the interest of simplicity and correctness, my NKE uses a single
lock. Each thread that enters my NKE and needs to access a critical
data structure must acquire that lock. This includes packets to be
processed, callback timers, and command messages delivered from a
control process. If a command message only needs to check or update a
simple flag, I try to use atomic operations instead of waiting for a
lock. Any thread that exits my NKE by returning to a caller OR
injecting a packet will first release the lock (if any) it acquired.
I hope other NKE developers find this useful and welcome your feedback.
Enjoy!
--
- Peter Sichel
Sustainable Softworks
<http://www.sustworks.com>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Macnetworkprog mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden