Re: What if process crashes while holding a locked semaphore?
Re: What if process crashes while holding a locked semaphore?
- Subject: Re: What if process crashes while holding a locked semaphore?
- From: Jerry Krinock <email@hidden>
- Date: Tue, 31 Mar 2009 06:30:30 -0700
On 2009 Mar 31, at 04:24, Terry Lambert wrote:
This is what sem_trywait() is for.
Hi, Terry. What do you mean by that? In my actual project I use
sem_trywait(), random back off, time out, etc. Here's another test
tool that uses sem_trywait() instead of sem_wait() -- [1]. But it
still has the exactly same problem as the code I posted yesterday.
A typical trick that's often used is to give the semaphore the value
of the pid holding the semaphore, then use sem_getvalue() to get the
pid and them kill(pid, 0); if the process is alive, you will get a
success for the kill or you will get an EPERM, indicating that the
process exists but you don't have permission to send it signals. If
the process doesn't exist, then you'll get ESRCH and you can cons up
the necessary ops to take over as the new owner.
Eeek! In Mac OS 10.5 pid values can exceed SEM_VALUE_MAX which is
32767. I'd have a bug that would be triggered after a system had been
running for a day or more, depending on usage. No, thank you.
But this "trick" seems like a workaround. Should I submit my code to
Apple's Bug Reporter, stating that the system should clear any
retained semaphore values after a process crashes?
An easy thing to do would be to switch to using System V semaphores
instead, and set the SEM_UNDO flag on a semop() to tell it what to
subtract from the value should the process crash (the semadj
value). That will resource track it without you needing to guess.
OK, but -- sigh!! -- then I'd have to start up a new learning curve
all over again. I'm going to look at Chris Suter's idea first.
NSMachPortNameServer is documented as being thread-safe, so it should
be multi-process safe. Maybe I'd get into less trouble by sticking
with [[them square] brackets].
Another easy fix would be to not crash. 8-).
Oh, of course, my tool will never crash in real life. Until it does.
Jerry
[1] New demo using sem_trywait() insted of sem_wait()
Steps to Reproduce:
1. Compile this code as a Standard "C" tool:
#include <stdio.h>
#include <unistd.h>
#include <semaphore.h>
#include <errno.h>
#define TRYWAIT
#ifdef TRYWAIT
int main (int argc, const char * argv[]) {
sem_t* descriptor = sem_open("MySem11", // See (*)
(O_CREAT),
S_IRWXU,
1) ;
// (*) name must be < 31 characters or
// errno = ENAMETOOLONG
while (1) {
if (descriptor != SEM_FAILED) {
printf("Waiting for semaphore\n") ;
int failed = sem_trywait(descriptor);
if (!failed) {
printf("Acquired semaphore. Working.\n") ;
// Simulated work:
usleep(1000000) ;
// Uncomment the next line to crash
// *(char*)0 = 0 ;
printf("Done\n") ;
sem_post(descriptor);
sem_close(descriptor);
break ;
}
else if (errno == EAGAIN) {
printf("Semaphore unavailable. Retry in 1.7 sec.\n") ;
usleep (1700000) ;
}
else {
printf("Unexpected error, errno=%d", errno) ;
break ;
}
}
else {
printf("Unexpected error, errno=%d", errno) ;
break ;
}
}
printf("Exitting.\n") ;
return 0 ;
}
2. Run it a few times in Mac OS 10.5.6 and verify that it works.
3. Uncomment the *(char*)0=0.
4. Run it again, so it crashes.
5. Comment out the *(char*)0=0.
6. Run it again.
Expected result:
"Acquired semaphore...", since the system should have cleared the
semaphore value when the tool crashed.
Actual result:
"Waiting for semaphore", forever.
7. Change the semaphore name to "MySema2"
8. Run it again. It works again.
Why the unexpected result?
Thanks,
Jerry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden