Issue solved. I was mistakenly assuming that the re-connected socket would seem to come from the originate sender. Which was obviously wrong, leading to the deadlock of the NKE just after placing a call to sock_connect(). I know… I shouldn't have held a lock over a call to sock_connect() in the first place...
Cheers, Jean On 11 sept. 2012, at 00:11, Jean Suisse wrote: Hi,
I tried the approach we discussed previously to hold socket connections for a few seconds (returning EJUSTRETURN in sf_connect_out and resuming the connection later using the sock_connect function). To this aim, I store in a TAILQ list: - the (socket_t so) variable given by the sf_connect_out function - as well as a copy of the structure pointed by (sockaddr *to).
Code:
errno_t sf_connect_out(void *cookie, socket_t so, const struct sockaddr *to) {
...
sock_wait_info->connection_id = connection_id; sock_wait_info->so = sock_info->so; sock_wait_info->to = *to; // copy of the sockaddr structure since I don't know what will become of the pointed data structure upon return
...
A few seconds later (the time for me to witness that the targeted connection is indeed delayed), I resume the connection by calling :
sock_connect(sock_wait_info->so, &sock_wait_info->to, MSG_DONTWAIT);
The issue is that despite the use of the MSG_DONTWAIT flag, the call is blocking. It is blocking my NKE, preventing it from exiting. As a side effect, some parts of the kernel as well as the user interface seem also blocked. I can't launch any new application from spotlight for instance. Any attempt will cause spotlight, the dock and the menu bar to hang. Already launched apps will do just fine for the moment, but 5 to 10 minutes later, I will start to get the "The Spinning Beach Ball of Death" from one of them. This will quickly propagate to the others within a few minutes, and by the time I get a terminal open on an other computer, I will have lost any possibility to accept an incoming ssh connection. Then it will be too late to reboot…. The console is still working properly, and the last message from my NKE will be "Connecting the socket #connection_id", which is the last line of code before the call to sock_connect()...
If, however, I act within the first few minutes, I will be able to reboot. The system will only take ages to do so, attempting to unload the kext as many time as it sees fit before giving up eventually.
Is there a bug in the documentation and the call to sock_connect is always blocking, or did I do something wrong ?
Best regards, Jean
On 28 août 2012, at 18:42, Vincent Lubet wrote: Le Aug 28, 2012 à 1:59 AM, Jean Suisse < email@hidden> a écrit : You have found a bug in the documentation. In fact when a socket filter sf_connect_out function returns EJUSTRETURN, the error code is converted to 0 (zero) before connect(2) returns to the caller.
So, I guess that from the calling process point of view, the connection attempt was successful. It can feel free to send data. But on the kernel side, nothing is over yet. What happens when the calling process starts sending data on this not-yet completed connection attempt ? Or maybe the process is blocked if it is a sync connection, and no callback occurs (yet) if it is an async request ?
When filtering, if I return EJUSTRETURN. I am then holding on to a handle to the socket itself and to the sockaddr structure. How do I resume the connection process later ? Should I use sock_connect with the socket_t so and struct sockaddr *to that I got from sf_connect_out ?
Yes, exactly.
Yes, you should free the mbufs you have stashed away.
I found 4 functions that can do that: mbuf_free mbuf_freecluster mbuf_freem mbuf_freem_list
Sould I use mbuf_free on each swallowed mbuf or should I call one of the other three on any mbuf I hold ? In the latter case, how do I know which function I should call ?
You should use mbuf_freem because mbufs are complex data structures that often a linked list of mbufs. For a description please see: You cannot close the socket from the kernel but you can disconnect the socket by using the sock_shutdown() KPI function.
Thanks. I found the function in kpi_socket.h.errno_t sock_shutdown(socket_t so, int how);
Is it safe to call it at any time ?
Yes, it is safe
Vincent
|