Re: (dead)locking of EC and OSC
Re: (dead)locking of EC and OSC
- Subject: Re: (dead)locking of EC and OSC
- From: Marc Guenther <email@hidden>
- Date: Mon, 26 Jul 2010 17:31:21 +0200
Hi,
On 23.07.2010, at 16:55, Chuck Hill wrote:
On Jul 23, 2010, at 5:08 AM, Marc Guenther wrote:
>
>> Hi,
>>
>> We recently had some deadlock problems involving EC and OSC locking, and I would like to validate my thoughts with the list.
>>
>> As a very basic rule with everything multithreaded, whenever two threads want access to the same resources, they have to lock them in a predefined identical order. Otherwise you have that classic deadlock possibility, where T1 has A and wants B, and T2 has B and wants A.
>>
>> This, for EOF, means the following:
>>
>> Whenever you hold an OSC lock, you are NOT allowed to lock any EC anymore!
>
> I think of it as taking escalating locks: EC -> DB Context -> OSC
To ensure this (at least the EC -> OSC stuff), I added this assertion to our EC subclass:
@Override
public void lock() {
// make sure the OSC is not locked, or we ourselves are already locked
assert !isOSCLocked() || this._lock.isHeldByCurrentThread();
super.lock();
}
/** if the OSC is locked and we are not */
private boolean isOSCLocked() {
EOObjectStore rootOS = rootObjectStore();
if (rootOS instanceof EOObjectStoreCoordinator) {
ReentrantLock lck = ((EOObjectStoreCoordinator) rootOS)._lock();
return lck != null && lck.isHeldByCurrentThread();
}
return false;
}
This immediately catched our problem, and so far neither found nor caused any others. Don't know yet if it will cause any trouble.
>> Why not? Cause the normal order is the other way round. First you lock the EC, then, when doing something with it, you lock the OSC.
>>
>> For example, our specific deadlock occurred, cause we had a Thread1:
>> - lock the OSC
>> - in a loop, do some operations involving locking of lots of ECs
>> (we have some stuff in there, which posts notifications to all ECs)
>> - Unlock OSC
>
> That seems like a long time to have the OSC locked. Why are you locking the OSC?
Actually, we were locking the DBCntxt. We are reloading some EOModels, and it seemed to be necessary to invalidate objects:
dbc.invalidateObjectsWithGlobalIDs(dbc.database().snapshots().allKeys());
Unfortunately, the actual removeModel() and addModel() calls were also in that try block. We moved them further down, and now the deadlock seems to be gone.
>> Now, while T1 was busy in its loop, some T2 was fetching a single EO:
>> - lock the EC
>> - fetch EO causing OSC to lock
>> -> which waits for T2 to finish
>>
>> And, as soon as the T1 loop reaches the locked EC of T2
>>
>> -> Deadlock
>>
>>
>> Now, the questions I have:
>> - Is this analysis correct?
>
> Yes.
>
>
>> - What do we do about it?
>
> Lock in the correct order. Don't lock and hold locked the OSC. Use different EOF stacks (hence different OSC). Don't touch the same EC in more than one thread concurrently.
At least the last one isn't really in our control, is it? All ECs are automatically touched outside of their threads whenever this ObjectsChanged notification is fired around.
I remember a very hard to track down deadlock several years ago (WO 5.0), that was caused by this notification.
>> If I understand all this correctly, my above rule means:
>>
>> _Whenever you have the need to explicitely lock an OSC, you can't do any operation, which would cause an formerly unlocked (by you) EC to be locked._
>>
>> Which might be more than you think. IIRC, you are not allowed to make any changes to any EO. Because that causes EOObjectsChangedInStoreNotifications to be fired around to all ECs.
>
> That happens when saveChanges() is called. saveChanges() locks the OSC.
Ah. And while it has the OSC lock, it locks all ECs in turn? Isn't that exactly the case I described above? Why does this ever work? Or does it lock the ECs after it released the OSC again?
I guess it uses some kind of magic. It definitely didn't trigger my assertion above.
>> Which in turn, causes all these ECs to be locked. Which violates the rule.
>
> If they are already locked, it will only queue the notification, not wait for a lock.
Which explains the magic :) This stuff is definitely not for the faint hearted.
>> Now, if you really need to do this (you NEED to lock an OSC, and you NEED to use ECs while having it locked), you could think, Oh it's OK, I just lock all of them first. But then again, in what order do you lock the ECs? Because if you do it in random order, you again have the possibility of deadlock.
>
> That sounds like you may be sharing ECs between threads in a dangerous pattern.
Yes. I guess it all boils down to: just don't do that :)
>> So, what is everybody doing?
>> - I don't lock the OSC myself. Why would I want to?
>
>> - I do, but I am VERY careful about what I do with it being locked.
>> - I do, and so far I have been lucky.
>
> Never any problems, provided you lock it at the appropriate time.
OK. Thanks for the help. It seems to be working fine now.
Now I'm back to investigate this missing snapshot issue that's still haunting us.
Marc
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden