Re: Thread safety question
Re: Thread safety question
- Subject: Re: Thread safety question
- From: Eric Gouriou <email@hidden>
- Date: Sat, 19 Feb 2005 15:46:41 -0800
[Apologies for boring subscribers with more nitpicking which might
not be directly relevant to the Darwin platform, threading
and parallel programming pitfalls are just too much fun to pass on]
On Feb 19, 2005, at 14:28, Matthew T. Russotto wrote:
On Feb 19, 2005, at 12:40 PM, Eric Gouriou wrote:
Apologies for using examples not immediately relevant to Darwin,
but I don't want this idiom to become more prevalent. It is broken,
even if in not so obvious ways.
Doesn't the use of a compare-and-swap or similar pattern (such as
test-and-set) solve this problem without a lock at all?
e.g
void * myVariable = 0;
volatile UInt32 sentinel = 0;
void* instance(void)
{
if ( !OSCompareAndSwap( 0, -1, &sentinel))
{
/* initialize variables */
if ( ! myVariable )
myVariable = allocateObject();
/* etc */
sentinel = 1;
}
else {
while (sentinel != 1)
/* yield, whatever that function is...pthread_yield()? */;
}
return myVariable;
}
This idiom is a custom spinlock (or yielding spinlock) variant
of the trivial (and correct) initialization idiom that always grabs a
lock.
The reason double-checked initialization is used is the perception
(real or
not) that grabbing a lock is too expensive to afford.
Grabbing an uncontented lock shouldn't be too expensive compared to
invoking a successful OSCompareAndSwap. So someone objecting to the
trivial idiom on the basis of performance might not like this version
much. Note also that the compare&swap version causes two writes to the
sentinel, while the proposed double-checked idiom attempts to perform
a single read once initialization occurred (very cache and SMP/cc-NUMA
friendly). So the idiom above won't please performance seekers.
The version above also relies on the compiler/process honoring the
ordering
properties on the write "sentinel = 1". If one is willing to assume
that this
property stands, then the idiom that doesn't use Compare&Swap is valid.
You could use myVariable as the sentinel also but that introduces
other dependencies, and besides you probably don't want to declare it
volatile.
I have a hard time constructing a scenario where making myVariable
volatile would be a concern for performance. If one uses a function
instance(), then presumably no direct access to myVariable should
occur outside of this function. The price of thread-safety has to be
paid somewhere.
You might be thinking of users of this global using an idiom such
as:
void* myLocal = myVariable;
if (NULL == myLocal) {
myLocal = instance()
}
foo = myLocal->field1;
The problem is that this would not be thread-safe, at least on
some systems with relaxed memory ordering. Reading a non-NULL
myVariable is not a guarantee that reading myLocal->field1 will
return the expected value. To guarantee this, one needs to have
an ordering constraint between the writes that can occur
in instance() and the reads occurring in this snippet.
For those into formal correctness proofs, the following document
explain why this is the case, in Table 9 on page 23:
<URL:http://www.intel.com/design/itanium/downloads/25142901.pdf>
instance() would perform the operations in the left column (p),
while the snippet above performs the operations in the right column
(q), with x being the address of myVariable->field1 and y being the
address of myVariable.
There might be cases for which using a separate sentinel makes sense
(e.g., for variables where no value can be used to denote the
uninitialized state). However one cannot ever bypass the sentinel
(and the price of 'volatile') and access the global directly.
If one is concerned about the cost of volatile, then the only viable
approach is to avoid accessing the global too often, by invoking
instance()
rarely and caching the result in a local variable when possible.
Since accessing a global has its cost (GOT/PLT access), it is often
a good idea anyway if performance is paramount, even when volatile
isn't used. YMMV.
If you were to do that, to avoid the inline-ordering problem you'd
assign the output of allocateObject to a local variable, then to the
global.
I do not understand the difference this could make. Modern compiler
can (legally) optimize this intermediate step away. Or am I missing
something ?
Regards - Eric
--
Eric Gouriou
email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden