Re: an issue with KPI implementation
Re: an issue with KPI implementation
- Subject: Re: an issue with KPI implementation
- From: Terry Lambert <email@hidden>
- Date: Tue, 21 Jun 2005 15:58:29 -0700
On Jun 7, 2005, at 1:27 AM, alexklg wrote:
We developed a small NKE and a sequence of steps that may make
this NKE to crash even under Tiger 10.4.1 (it crashes in more
stable manner
under 10.4). However, even if this NKE doesn't crash (doesn't cause
the
system to panic), the number of the sockets it is attached to grows
uncontrollably and this seems to be a Darwin issue.
Do you reference count entry and exit, and use a mutex to protect the
entry and exit counts, so that any in-progress operations can drain
out before you actually permit the unload to complete? If not, that
would most likely explain the gcrashing.
The problem of sockets concerns with the uncontrollable grow of the
number
os sockets that were created in response to incoming connections
but that
were not accept'ed.
For this, you will have to wait for the 2 MSL timer to expire on
connections being shut down. If these are straight TCP/IP
connections, be aware that if the client connection is not shutdown
properly (e.g. because you are using a "WebAvalance" or similar test
box that establishes connections through SYN-flooding) then the
sockets created can persist for a very long time in FIN-WAIT-2 state,
at least until the "backup" timeout that most OS vendors (like us)
add to the TCP stack to avoid it taking forever.
Basically, you should set a flag to deny additional entries once you
start the unload process, and wait for everything in progress to
drain out before closing, or you will end up with "orphaned stacks"
trying to return through code that's no longer in the kernel address
space after they eventually wakeup and try to return through your NKE.
Maybe one of the networking guys can point you to example code that
does the necessary counting and locking.
-- Terry
Problem description
===================
The very basic socket-filter NKE crashes on unload (kextunload).
If it doesn't crash, the number of sockets the socket-filter is
attached to
grows uncontrollably.
Prerequisites
=============
1. Mac OS 10.4.1
2. test_nke NKE (see the attach)
test_nke is a basic socket-filter global NKE that simply counts the
number
of
sockets it is attached to.
test_nke outputs this number to the system log (sock_count).
3. isockcnt utility.
The utility does the following
- creates a listening TCP socket (server-side) in a separate process,
The second argument of listen() is some large number (256). The
listened
port is some arbitrary port_number_1
- from a separate process (client-side) connects to the listening
socket
100 times,
- at the server side accepts the first connection from the client
side and
then does sleep(2), but the server-side process is killed in one
second
by alarm.
- closes the connection from the client-side (exit(0)).
All these steps are repeated several times for the port_number_1 =
port_number_1 + 1 with
the interval of 3 seconds.
This results in a large number of sockets that the test_nke is
attached to
but is not
detached from during reasonable amount of time (about 2 minutes).
Remark. From our point of view, the sockets that the filter is not
detached
from
are the sockets from the accept queue. These sockets are created in
response
to
incoming connections. Apparently, for them the kernel structures are
allocated but
these sockets are not initialized completely. We believe that this
happens
because accept() was not called for these sockets and meanwhile the
client
closed the connection (e.g. was terminated) for some reason.
How to reproduce
================
1. Login with administrative account.
3. Load the test_nke NKE (sudo kextload test_nke.kext).
4. Start the insockcnt utility (sudo ./insockcnt).
5. Wait about 30 seconds.
6. Stop the insockcnt utility by pressing 'Control+C'.
7. The number of sockets the test_nke is attached is dumped
to the system log. This number may be considerably more than 0,
but it should be equal to 0, because all the sockets the NKE had been
attached to were closed. If the dumped socket_count is not equal 0,
then the subsequent kextunload may result in a system crash.
Remark. Even if the insockcnt utility is started without sudo
(becoming
super-user),
and is terminated in about 2-3 minutes (instead of 30 seconds), the
kernel
starts
lacking the resources and goes into panic.
Thanks,
alexklg
<testnke-20050606-03.tgz>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40apple.com
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden