Re: What if process crashes while holding a locked semaphore?
Re: What if process crashes while holding a locked semaphore?
- Subject: Re: What if process crashes while holding a locked semaphore?
- From: Terry Lambert <email@hidden>
- Date: Tue, 31 Mar 2009 10:55:45 -0700
On Mar 31, 2009, at 6:30 AM, Jerry Krinock wrote:
On 2009 Mar 31, at 04:24, Terry Lambert wrote:
This is what sem_trywait() is for.
Hi, Terry. What do you mean by that? In my actual project I use
sem_trywait(), random back off, time out, etc. Here's another test
tool that uses sem_trywait() instead of sem_wait() -- [1]. But it
still has the exactly same problem as the code I posted yesterday.
You're still using it wrong.
The name of a named semaphore is a rendezvous; once the processes
involved have all arrived at the rendezvous, the protocol is to make
it unavailable for further use.
Everyone who's going to be using the semaphore gets the handle on it
up front. Then it gets unlinked so that the name is available for
reuse. Internally, POSIX semaphores are implemented as fds, which are
resource tracked in the per process open file table. When the process
dies, the close is called on the fd as part of exit1(), and that ends
up calling psem_close() on behalf of the semaphore, thus decrementing
the use count. On the transition from 1->0 uses, the semaphore is
cleaned up via psem_delete().
A typical trick that's often used is to give the semaphore the
value of the pid holding the semaphore, then use sem_getvalue() to
get the pid and them kill(pid, 0); if the process is alive, you
will get a success for the kill or you will get an EPERM,
indicating that the process exists but you don't have permission to
send it signals. If the process doesn't exist, then you'll get
ESRCH and you can cons up the necessary ops to take over as the new
owner.
Eeek! In Mac OS 10.5 pid values can exceed SEM_VALUE_MAX which is
32767. I'd have a bug that would be triggered after a system had
been running for a day or more, depending on usage. No, thank you.
But this "trick" seems like a workaround. Should I submit my code
to Apple's Bug Reporter, stating that the system should clear any
retained semaphore values after a process crashes?
POSIX semaphores are technically not fully supported in any case - see
<unistd.h>:
#define _POSIX_SEMAPHORES (-1) /* [SEM] */
Which is why I've been pointing you at System V semaphores, which are,
plus you can get and set their values (although you're right - they
are short values, so the pid trick wouldn't work; ah, the cross of
binary backward compatiblity...).
For what you are doing, you'd actually probably be better off just
using O_EXCL on some data file associated with your installer; if your
installer is dead, then the next person to try is allowed to get the
open for exclusive use; if it's not dead, then they block on the open
(unless they also specify O_NONBLOCK).
An easy thing to do would be to switch to using System V semaphores
instead, and set the SEM_UNDO flag on a semop() to tell it what to
subtract from the value should the process crash (the semadj
value). That will resource track it without you needing to guess.
OK, but -- sigh!! -- then I'd have to start up a new learning curve
all over again. I'm going to look at Chris Suter's idea first.
NSMachPortNameServer is documented as being thread-safe, so it
should be multi-process safe. Maybe I'd get into less trouble by
sticking with [[them square] brackets].
Another easy fix would be to not crash. 8-).
Oh, of course, my tool will never crash in real life. Until it does.
As David pointed out, you could always set up a signal handler to
clean up after yourself:
<http://www.opengroup.org/onlinepubs/009695399/functions/sem_post.html>
"The sem_post() function shall be reentrant with respect to signals
and may be invoked from a signal-catching function."
[ ... ]
2. Run it a few times in Mac OS 10.5.6 and verify that it works.
3. Uncomment the *(char*)0=0.
4. Run it again, so it crashes.
5. Comment out the *(char*)0=0.
6. Run it again.
Expected result:
"Acquired semaphore...", since the system should have cleared the
semaphore value when the tool crashed.
Actual result:
"Waiting for semaphore", forever.
7. Change the semaphore name to "MySema2"
8. Run it again. It works again.
Why the unexpected result?
<http://www.opengroup.org/onlinepubs/009695399/functions/sem_unlink.html
>
"If one or more processes have the semaphore open when sem_unlink()
is called, destruction of the semaphore is postponed until all
references
to the semaphore have been destroyed by calls to sem_close(), _exit(),
or exec. Calls to sem_open() to recreate or reconnect to the semaphore
refer to a new semaphore after sem_unlink() is called."
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden