Re: Socket filters: limiting amount of swallowed mbufs
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com I am looking for a way to let the tcp_input machinery know how many mbufs my socket filter has swallowed but not yet injected back into a TCP stream. This is so both swallowed data and data already in the sockbuf are taken into account when adjusting window sizes. So my first question would be whether changing the sockbuf->sb_hiwat and sb_cc fields is doable. From the setsockopt code, it seems that zero values are not valid for SO_RCVBUF and SO_LOWAT. I'm also considering pausing the receiving stream once a number of mbufs are stuck inside my filter. So I'd like to know whether using TRAFFIC_MGT_SO_BG_SUPPRESSED, or simply setting TF_RXWIN0SENT into tcpcp->t_flags and clearing it later would work reliably. Since the above would require modifying internal structs that NKEs don't have access to, a hint towards something less intrusive would of course be appreciated. Thanks, Bogdan -- Terry _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... On Apr 30, 2010, at 3:28 AM, Bogdan Harjoc wrote: Output packets or input packets? It sounds like you are talking about input packets because you are referencing tcp_input(). If you are talking about input packets, and your NKE is swallowing the packets before they get to tcp_input(), then the thing that's going to be taking the window size into account will be the sending system. Once the send window is full on the sending system, until you let the packets into your local tcp_input(), they aren't going to be ACK'ed, and therefore there will be no further packets sent into the TCP stream from the other end. So before tcp_input() is the wrong place for you to be trying to implement your NKE. Basically, you have no control whatsoever about the senders idea of what the window size is, via tcp_input(). Although there are some traffic shaping paradigms that use the window size in order to control the send buffering on the sending side of a remote server/router, they do so by controlling the size of the window in the TCP ACKs sent to the other end, to indicate how much space is left in the window. Note that these are much better at preventing head of line blocking than say RED-queueing or packet ordering, such as in AltQ, since you can starve the sender of send buffers with unacknowledged data, unless you use RTSP or some other end-to-end flow control mechanism, but they are probably not suitable for implementation as an NKE, unless you plan on rewriting ACK packets, and are prepared for continuous (and exponentially decaying) requests for acknowledgement from the sender. It sounds like you are trying to avoid the degenerate behaviour when you have asymmetric link speed over two interfaces, e.g. if you are attempting to implement this on a border router between a high speed LAN and a low sppec link to another high speed network. A good example of this is a small network on the other side of a modem, cable modem, or DSLAM connection, where the router on the other end is going to fill up with the ftp transfer data for you from the next hop up, and have no room for the "priority" traffic that's NOT the "shaped" traffic (this is the primary fallacy of most "traffic shaping" by packet prioritization). If that's your plan, you need to be aware that that leads to other issues, depending on the congestion control algorithm(s) in use at the sending side; for example, it would really mess with anything that depends on a calculation of the bandwidth delay product, such as TCP rate halving, or rate based ACK, or a number of other algorithms that you won't be able to negotiate with the sender. You should also be aware that you will need to deal with a local delayed ACK timer in your code, or you will get uneven flow as the remote side backs off asking for ACKs, when you become prepared to send ACKs when requested, but find that they are not being requested when you are finally ready to send them, until some (decay function) time after you are ready. You should also be prepared for size advertisement changes in reaction to your interference being seen as additional local bandwidth delay product or congestion, since a drop in window size below what you have "in flight" is going to shoot you in the foot with a machine gun. The Pittsburgh Supercomputing Center (PSC) at CMU has a number of papers which would probably be useful to you, as well as some of the Scala Server work out of Peter Druschel's group at Rice University. This email sent to site_archiver@lists.apple.com
participants (1)
-
Terry Lambert