RE: Asynchronous sock_sendmbuf
RE: Asynchronous sock_sendmbuf
- Subject: RE: Asynchronous sock_sendmbuf
- From: "Eddy Quicksall" <email@hidden>
- Date: Fri, 23 May 2008 17:00:23 -0400
Thanks for your input.
I have it working for BSD using the so routines as outlined in TCP/IP
Illustrated. I'm making a change now to use threads on an Apple system. If
BSD ever changes that then I'll just use the slower Apple method.
Eddy
-----Original Message-----
From: Josh Graessley [mailto:email@hidden]
Sent: Friday, May 23, 2008 4:22 PM
To: Eddy Quicksall
Cc: darwinKernel Dev
Subject: Re: Asynchronous sock_sendmbuf
If you're using so_send on Mac OS X, you're setting yourself up for a
world of hurt. That's not part of the official KPIs and liable to
break at any time.
-josh
On May 23, 2008, at 11:25 AM, Eddy Quicksall wrote:
> Thanks for the input.
>
> I'm not using sock_sendmbuf, I'm using so_send.
>
> I already invoke the callback if so_send shows that everything was
> copied.
> But if everything was not copied then I can't invoke the callback. I
> have
> found that I get an upcall when the tcp ACK's arrive. At that time I
> check
> to see if the ACK belongs to the most recent outstanding send. If so
> I go
> back to my main thread where I invoke the callback.
>
> Eddy
>
> -----Original Message-----
> From: Vincent Lubet [mailto:email@hidden]
> Sent: Friday, May 23, 2008 12:22 PM
> To: Eddy Quicksall
> Cc: darwinKernel Dev
> Subject: Re: Asynchronous sock_sendmbuf
>
> Eddy,
>
> There is no such upcall for when data has been copied into the
> internal buffer. If the call to sock_sendmbuf() -- or sock_send() --
> succeeds then you can be sure the data has been copied. If you really
> need to emulate this WSA callback, why not simply invoke the callback
> from the point sock_sendmbuf() returns?
>
> Vincent
>
> On May 23, 2008, at 6:51 AM, Eddy Quicksall wrote:
>
>> Sorry, Terry ... I never really answered your question regarding my
>> problem.
>>
>> I'm porting some code that was written for Windows. That code uses
>> WSA
>> Sockets. With WSASend, you specify a callback that will occur "when
>> the send
>> operation has been completed" ... by "completed" it really only
>> needs to
>> mean everything has been copied to internal buffers.
>>
>> The upper level code is totally system and transport independent so
>> I can't
>> modify it. So I simulate WSA Sockets completely using so_send/
>> so_recv and
>> upcalls. All different operating systems will use a different
>> simulation
>> routine.
>>
>> Everything runs in a single thread. That thread will sleep when
>> there is no
>> I/O and the upcall will wake that up if it needs to. Technically I
>> guess the
>> upcall may be a thread but it is not one of my threads. I understand
>> the
>> technique of making other worker threads but that has its limits due
>> to a
>> high number of potential connections.
>>
>> I have this mostly working now but still need to understand more
>> about the
>> so_send/so_recv/up_call stuff to be sure. If there is a system
>> upgrade or
>> difference in other BSD implementations then that is acceptable
>> because all
>> of the changes will be isolated into one source file (the file that
>> does the
>> actual socket calls).
>>
>> I know that Apple enthusiasts don't want to recognize Microsoft so I
>> don't
>> blame you if you don't want to help ... but this software must run
>> on an
>> Apple (I'm starting with BSD at this moment, however).
>>
>> Eddy
>>
>> -----Original Message-----
>> From: Terry Lambert [mailto:email@hidden]
>> Sent: Friday, May 23, 2008 6:18 AM
>> To: Eddy Quicksall
>> Cc: Igor Mikushkin; email@hidden
>> Subject: Re: Asynchronous sock_sendmbuf
>>
>> On May 22, 2008, at 7:10 PM, Eddy Quicksall wrote:
>>> Maybe the efficiency would still be fairly good but one thing is
>>> four sure.
>>> It would take lots more memory to have 1000 or so threads just to
>>> handle
>>> what can be handled with a single thread.
>>>
>>> Regarding upper level recovery ... any networking software must be
>>> aware
>>> that the connection can be lost and any time. It is not the
>>> responsibility
>>> of the lower layers to recover from a lost connection. For good
>>> upper layer
>>> protocols, this is built into them. For example, iSCSI has lots of
>>> mechanisms to deal with this.
>>>
>>> I'll look into sototcpcb(so). Thanks for that tip.
>>>
>>> Regarding " not published API", if I don't want to use extra threads
>>> and I
>>> don't want to poll and I don't want to check every few ms, how would
>>> you
>>> suggest that I implement non-blocking socket calls?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>>
>>>>>
>>>>
>>
>> Hard to answer specifically, given you haven't shared much about the
>> problem you are trying to solve.
>>
>> One way would be to queue the requests to a pool of one or more
>> worker
>> threads that make blocking calls.
>>
>> You realize that blocking only happens if your send queue on a socket
>> exceeds the amount of data allowed to be pending send on a given
>> socket at a time, right? It's not like "can block" means "might
>> block
>> for no reason at all just to teach you to not use blocking calls".
>> Blocking happens for a reason, and if you avoid giving it the reason,
>> it simply does what you tell it without blocking.
>>
>> If you have an upper level recovery protocol, you can use the acks
>> there to
>> pace your output to the sockets, and thereby guarantee that you will
>> never fill a given sockets send queue. One technique that works well
>> here is called "rate halving".
>>
>> If you think you will occasionally fill a send queue because your
>> upper level protocol isn't that smart, you can pick a number out of
>> the 1000 where you think that is probable, and add one.
>>
>> At that point, you marshall your writes using queues to a pool of
>> that
>> many work-to-do threads, and they do the blocking writes on your
>> behalf. Because there are only ever N of these, with N+1 threads, you
>> always have one available to do work, so there's no starvation.
>>
>> You could allocate worker threads on demand; this assumes you know
>> for
>> a fact N won't get large; you'd probably want it adminstratively
>> bound
>> at an upper limit anyway.
>>
>> Another alternative is to use timers to interrupt blocked sends after
>> a while.
>>
>> Realize, though, that if full send queues are the rule rather than
>> the
>> exception, your protocol design is probably flawed.
>>
>> If you can tolerate losses/gaps, you should consider RED queuing.
>> Random Early Drop let's you avoid doing work which will ultimately be
>> unable to be completed before processing it to that point through an
>> expensive-to-run stack of software. The cheapest work is work you
>> don't do.
>>
>> In the limit, using upcalls to retry pending sends that are pending
>> because the last attempt got EWOULDBLOCK is just a poor man's
>> implementation of polling using opportunistic timers, rather than
>> explicit timers. This has the disadvantage of you taking "lbolt"s to
>> run what is, in effect, a soft interrupt (netisr-style) polling for
>> available unsent data. This happens, even if you have no work to do,
>> and so it's just useless overhead. If your steady-state is not "full
>> send queues with even more data pending enqueuing", then these will
>> mostly have no work to do. If so, you are better off using a non-
>> periodic oneshot timer, which has the advantage of you being able to
>> only turn it on if you know you have data in this state. A more
>> sophisticated program would order a timer list in ascending
>> expiration
>> order, and include expectation latency in calculating the expire time
>> (e.g. by keeping a per connection moving average and using that to
>> calculate a "retry this send after I expect the send to not fail,
>> based on this clients past history").
>>
>> Also, since no one has bothered mentioning it, doing large data
>> transfers in your upcalls adds latency. So does doing large data
>> transfers in what is supposed to be your hot path. So using a quick
>> hand off of the data transfer to another thread eliminates latency (a
>> send is essentially a request to address a message, then copy the
>> data
>> from your buffers to mbufs, and stuff the mbuf chain on a socket send
>> queue). So more threads don't equal bad performance, just like they
>> don't mean good performance, either: they are just a tool, best if
>> used appropriately (like the timers, in the previous example).
>>
>> Ultimately, it comes down to knowing your problem space and knowing
>> appropriate algorithms for mapping code to that space effectively.
>>
>> Finally, you could also do what you want in user space, where there
>> are a lot of useful tools, like kqueue, that already report the type
>> of events you are interested in, but which are simply not available
>> in
>> kernel space. Being in the kernel doesn't automatically mean faster;
>> you will get a much bigger win for anything doing sustained
>> communications from reducing latency and amortizing fixed costs over
>> as many units as possible, rather than taking a per-unit hit for a
>> cost.
>>
>> -- Terry
>>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Darwin-kernel mailing list (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>> @apple.com
>>
>> This email sent to email@hidden
>
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Darwin-kernel mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden