site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com [Apologies for boring subscribers with more nitpicking which might not be directly relevant to the Darwin platform, threading and parallel programming pitfalls are just too much fun to pass on] Apologies for using examples not immediately relevant to Darwin, but I don't want this idiom to become more prevalent. It is broken, even if in not so obvious ways. e.g void * myVariable = 0; volatile UInt32 sentinel = 0; void* instance(void) { if ( !OSCompareAndSwap( 0, -1, &sentinel)) { /* initialize variables */ if ( ! myVariable ) myVariable = allocateObject(); /* etc */ sentinel = 1; } else { while (sentinel != 1) /* yield, whatever that function is...pthread_yield()? */; } return myVariable; } Grabbing an uncontented lock shouldn't be too expensive compared to invoking a successful OSCompareAndSwap. So someone objecting to the trivial idiom on the basis of performance might not like this version much. Note also that the compare&swap version causes two writes to the sentinel, while the proposed double-checked idiom attempts to perform a single read once initialization occurred (very cache and SMP/cc-NUMA friendly). So the idiom above won't please performance seekers. I have a hard time constructing a scenario where making myVariable volatile would be a concern for performance. If one uses a function instance(), then presumably no direct access to myVariable should occur outside of this function. The price of thread-safety has to be paid somewhere. You might be thinking of users of this global using an idiom such as: void* myLocal = myVariable; if (NULL == myLocal) { myLocal = instance() } foo = myLocal->field1; The problem is that this would not be thread-safe, at least on some systems with relaxed memory ordering. Reading a non-NULL myVariable is not a guarantee that reading myLocal->field1 will return the expected value. To guarantee this, one needs to have an ordering constraint between the writes that can occur in instance() and the reads occurring in this snippet. For those into formal correctness proofs, the following document explain why this is the case, in Table 9 on page 23: <URL:http://www.intel.com/design/itanium/downloads/25142901.pdf> instance() would perform the operations in the left column (p), while the snippet above performs the operations in the right column (q), with x being the address of myVariable->field1 and y being the address of myVariable. There might be cases for which using a separate sentinel makes sense (e.g., for variables where no value can be used to denote the uninitialized state). However one cannot ever bypass the sentinel (and the price of 'volatile') and access the global directly. I do not understand the difference this could make. Modern compiler can (legally) optimize this intermediate step away. Or am I missing something ? Regards - Eric _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... On Feb 19, 2005, at 14:28, Matthew T. Russotto wrote: On Feb 19, 2005, at 12:40 PM, Eric Gouriou wrote: Doesn't the use of a compare-and-swap or similar pattern (such as test-and-set) solve this problem without a lock at all? This idiom is a custom spinlock (or yielding spinlock) variant of the trivial (and correct) initialization idiom that always grabs a lock. The reason double-checked initialization is used is the perception (real or not) that grabbing a lock is too expensive to afford. The version above also relies on the compiler/process honoring the ordering properties on the write "sentinel = 1". If one is willing to assume that this property stands, then the idiom that doesn't use Compare&Swap is valid. You could use myVariable as the sentinel also but that introduces other dependencies, and besides you probably don't want to declare it volatile. If one is concerned about the cost of volatile, then the only viable approach is to avoid accessing the global too often, by invoking instance() rarely and caching the result in a local variable when possible. Since accessing a global has its cost (GOT/PLT access), it is often a good idea anyway if performance is paramount, even when volatile isn't used. YMMV. If you were to do that, to avoid the inline-ordering problem you'd assign the output of allocateObject to a local variable, then to the global. -- Eric Gouriou eric.gouriou@pobox.com This email sent to site_archiver@lists.apple.com
participants (1)
-
Eric Gouriou