site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com -- Terry Prerequisites ============= 1. Mac OS 10.4.1 2. test_nke NKE (see the attach) test_nke outputs this number to the system log (sock_count). All these steps are repeated several times for the port_number_1 = port_number_1 + 1 with the interval of 3 seconds. How to reproduce ================ 1. Login with administrative account. 3. Load the test_nke NKE (sudo kextload test_nke.kext). 4. Start the insockcnt utility (sudo ./insockcnt). 5. Wait about 30 seconds. 6. Stop the insockcnt utility by pressing 'Control+C'. 7. The number of sockets the test_nke is attached is dumped to the system log. This number may be considerably more than 0, but it should be equal to 0, because all the sockets the NKE had been attached to were closed. If the dumped socket_count is not equal 0, then the subsequent kextunload may result in a system crash. Thanks, alexklg _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... On Jun 7, 2005, at 1:27 AM, alexklg wrote: We developed a small NKE and a sequence of steps that may make this NKE to crash even under Tiger 10.4.1 (it crashes in more stable manner under 10.4). However, even if this NKE doesn't crash (doesn't cause the system to panic), the number of the sockets it is attached to grows uncontrollably and this seems to be a Darwin issue. Do you reference count entry and exit, and use a mutex to protect the entry and exit counts, so that any in-progress operations can drain out before you actually permit the unload to complete? If not, that would most likely explain the gcrashing. The problem of sockets concerns with the uncontrollable grow of the number os sockets that were created in response to incoming connections but that were not accept'ed. For this, you will have to wait for the 2 MSL timer to expire on connections being shut down. If these are straight TCP/IP connections, be aware that if the client connection is not shutdown properly (e.g. because you are using a "WebAvalance" or similar test box that establishes connections through SYN-flooding) then the sockets created can persist for a very long time in FIN-WAIT-2 state, at least until the "backup" timeout that most OS vendors (like us) add to the TCP stack to avoid it taking forever. Basically, you should set a flag to deny additional entries once you start the unload process, and wait for everything in progress to drain out before closing, or you will end up with "orphaned stacks" trying to return through code that's no longer in the kernel address space after they eventually wakeup and try to return through your NKE. Maybe one of the networking guys can point you to example code that does the necessary counting and locking. Problem description =================== The very basic socket-filter NKE crashes on unload (kextunload). If it doesn't crash, the number of sockets the socket-filter is attached to grows uncontrollably. test_nke is a basic socket-filter global NKE that simply counts the number of sockets it is attached to. 3. isockcnt utility. The utility does the following - creates a listening TCP socket (server-side) in a separate process, The second argument of listen() is some large number (256). The listened port is some arbitrary port_number_1 - from a separate process (client-side) connects to the listening socket 100 times, - at the server side accepts the first connection from the client side and then does sleep(2), but the server-side process is killed in one second by alarm. - closes the connection from the client-side (exit(0)). This results in a large number of sockets that the test_nke is attached to but is not detached from during reasonable amount of time (about 2 minutes). Remark. From our point of view, the sockets that the filter is not detached from are the sockets from the accept queue. These sockets are created in response to incoming connections. Apparently, for them the kernel structures are allocated but these sockets are not initialized completely. We believe that this happens because accept() was not called for these sockets and meanwhile the client closed the connection (e.g. was terminated) for some reason. Remark. Even if the insockcnt utility is started without sudo (becoming super-user), and is terminated in about 2-3 minutes (instead of 30 seconds), the kernel starts lacking the resources and goes into panic. <testnke-20050606-03.tgz> _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/tlambert% 40apple.com This email sent to tlambert@apple.com This email sent to site_archiver@lists.apple.com