Re: Socket filters: limiting amount of swallowed mbufs
Re: Socket filters: limiting amount of swallowed mbufs
- Subject: Re: Socket filters: limiting amount of swallowed mbufs
- From: Terry Lambert <email@hidden>
- Date: Fri, 30 Apr 2010 11:48:45 -0700
On Apr 30, 2010, at 3:28 AM, Bogdan Harjoc wrote:
I am looking for a way to let the tcp_input machinery know how many
mbufs my socket filter has swallowed but not yet injected back into a
TCP stream. This is so both swallowed data and data already in the
sockbuf are taken into account when adjusting window sizes.
So my first question would be whether changing the sockbuf->sb_hiwat
and sb_cc fields is doable. From the setsockopt code, it seems that
zero values are not valid for SO_RCVBUF and SO_LOWAT.
I'm also considering pausing the receiving stream once a number of
mbufs are stuck inside my filter. So I'd like to know whether using
TRAFFIC_MGT_SO_BG_SUPPRESSED, or simply setting TF_RXWIN0SENT
into tcpcp->t_flags and clearing it later would work reliably.
Since the above would require modifying internal structs that NKEs
don't have access to, a hint towards something less intrusive would
of course be appreciated.
Thanks,
Bogdan
Output packets or input packets? It sounds like you are talking about
input packets because you are referencing tcp_input().
If you are talking about input packets, and your NKE is swallowing the
packets before they get to tcp_input(), then the thing that's going to
be taking the window size into account will be the sending system.
Once the send window is full on the sending system, until you let the
packets into your local tcp_input(), they aren't going to be ACK'ed,
and therefore there will be no further packets sent into the TCP
stream from the other end.
So before tcp_input() is the wrong place for you to be trying to
implement your NKE.
Basically, you have no control whatsoever about the senders idea of
what the window size is, via tcp_input(). Although there are some
traffic shaping paradigms that use the window size in order to control
the send buffering on the sending side of a remote server/router, they
do so by controlling the size of the window in the TCP ACKs sent to
the other end, to indicate how much space is left in the window.
Note that these are much better at preventing head of line blocking
than say RED-queueing or packet ordering, such as in AltQ, since you
can starve the sender of send buffers with unacknowledged data, unless
you use RTSP or some other end-to-end flow control mechanism, but they
are probably not suitable for implementation as an NKE, unless you
plan on rewriting ACK packets, and are prepared for continuous (and
exponentially decaying) requests for acknowledgement from the sender.
It sounds like you are trying to avoid the degenerate behaviour when
you have asymmetric link speed over two interfaces, e.g. if you are
attempting to implement this on a border router between a high speed
LAN and a low sppec link to another high speed network. A good
example of this is a small network on the other side of a modem, cable
modem, or DSLAM connection, where the router on the other end is going
to fill up with the ftp transfer data for you from the next hop up,
and have no room for the "priority" traffic that's NOT the "shaped"
traffic (this is the primary fallacy of most "traffic shaping" by
packet prioritization).
If that's your plan, you need to be aware that that leads to other
issues, depending on the congestion control algorithm(s) in use at the
sending side; for example, it would really mess with anything that
depends on a calculation of the bandwidth delay product, such as TCP
rate halving, or rate based ACK, or a number of other algorithms that
you won't be able to negotiate with the sender.
You should also be aware that you will need to deal with a local
delayed ACK timer in your code, or you will get uneven flow as the
remote side backs off asking for ACKs, when you become prepared to
send ACKs when requested, but find that they are not being requested
when you are finally ready to send them, until some (decay function)
time after you are ready.
You should also be prepared for size advertisement changes in reaction
to your interference being seen as additional local bandwidth delay
product or congestion, since a drop in window size below what you have
"in flight" is going to shoot you in the foot with a machine gun.
The Pittsburgh Supercomputing Center (PSC) at CMU has a number of
papers which would probably be useful to you, as well as some of the
Scala Server work out of Peter Druschel's group at Rice University.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden