Ignore: Racy DMESG connect
Ignore: Racy DMESG connect
- Subject: Ignore: Racy DMESG connect
- From: Godfrey van der Linden <email@hidden>
- Date: Mon, 18 Jan 2010 15:59:59 +1100
On 2010-01-18, at 3:32 PM, Godfrey van der Linden wrote:
> This isn't a race, see bottom of email!
>
> On 2010-01-18, at 3:16 PM, Godfrey van der Linden wrote:
>
>> G'day, all networking types.
>>
>> I've recently tripped over a kernel race when attempting to connect with a DGRAM socket. This is the code fragment that is causing problems
>>
>> struct sockaddr_un sa2;
>> struct sockaddr_storage saLocal;
>> socklen_t saLocalLen;
>>
>> sa2.sun_family = AF_UNIX;
>> strcpy(sa2.sun_path, kDefSocketName2);
>> (void) unlink(kDefSocketName2);
>>
>> sys_test(fd_skt = socket(AF_UNIX, SOCK_DGRAM, 0),
>> "Can't open DGRAM socket %d", child_pid);
>> sys_test(bind(fd_skt, (struct sockaddr *) &sa2, sizeof(sa2)),
>> "Can't bind DGRAM socket %d", child_pid);
>>
>> // Wait for first client message for connecting
>> sys_test(recvfrom(fd_skt, msg, sizeof(msg), /* flags */ 0,
>> (struct sockaddr *) &saLocal, &saLocalLen),
>> "Error reading first message");
>> #if NOCRASH
>> saP = (struct sockaddr_un *) &saLocal;
>> #endif // NOCRASH
>> sys_test(connect(fd_skt, (struct sockaddr *) &saLocal, saLocalLen),
>> "Can't connect to client %d", child_pid);
>>
>> notes:
>> 1> sys_test is a macro that checks the result for -1 and crashes with the error if necessary.
>> 2> sys_test is implemented within a do{ ... } while(0), i.e. the macros don't interact).
>> 3> the server is single threaded and runs in its own process after a fork().
>> 4> The code is compiled with -O0 (no optimisation)
>> 5> I'm running on an '06 intel iMac, which is dual core.
>>
>> When I compile and run with -DNOCRASH=1 connect never seems to fail.
>>
>> However if the connect directly follows the recvfrom then I get a
>> "nwmark(main): Can't connect to client 53673 - Invalid argument(22)"
>> about 50% of the time. Process is single threaded (see note <3>), so the saLocal variable is not being scribbled on between the recvfrom and the connect.
>>
>> Has anybody seen the kernel race in this way before? Should I submit a bug?
>
> It is not a race!
>
> I tried recvfrom()/usleep(100)/connect() it does not fix the problem!
>
> Is there any way that the sockaddr_storage returned from the recvfrom could be in an odd state. If I refer to it, even by just assigning the address this problem doesn't reproduce.
RTFM, mea-culpa. The address_len arg to recvfrom is for input and output and I didn't initialise it, oops.
Interesting that the problem was not deterministic, it only failed occasionally, some sort of weird initialisation race between the two legs of the fork() perhaps or some other init time signal?
Godfrey
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden