Apple

On Aug 27, 2008, at 4:18 PM, Dov Rosenberg wrote:

We have a WO (5.4.1)

I'd really very seriously think about moving to 5.4.2 to see if this helps.

app that is deployed as a servlet in Tomcat 5.5 (Java 1.5). We do not use Project Wonder or multiple ObjectStoreCoordinators. We have experienced intermittent hanging issues under load. When we look at the thread dumps I always see things like

"http-10042-Processor111" nid=60350 state=WAITING
    - waiting on <0xcb1792> (a com.webobjects.foundation.NSRecursiveLock)
    - locked <0xcb1792> (a com.webobjects.foundation.NSRecursiveLock)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Unknown Source)
    at com.webobjects.foundation.NSRecursiveLock.lock(NSRecursiveLock.java:72)
    at com.webobjects.eocontrol.EOObjectStoreCoordinator.lock(EOObjectStoreCoordinator.java:466)
    at com.webobjects.eocontrol.EOEditingContext.lockObjectStore(EOEditingContext.java:4735)
    at com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(EOEditingContext.java:4112)
    at com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(EOEditingContext.java:4500)
    at com.webobjects.eoaccess.EOUtilities.objectsMatchingValues(EOUtilities.java:193)
    at com.webobjects.eoaccess.EOUtilities.objectsMatchingKeyAndValue(EOUtilities.java:168)
...

Those threads are symptoms, not the problem. The problem is that there is a hanging lock on EOObjectStoreCoordinator.

It seems the root of every thread always has NSRecursiveLock.lock() as part of the thread dump. It doesn’t seem to matter if the call was via EOUtilities or thru a fetch.

The real key line in the trace is

at com.webobjects.eocontrol.EOObjectStoreCoordinator.lock(EOObjectStoreCoordinator.java:466)

In a recent thread dump there were 286 threads listed of which 251 (all HTTP threads) had references to NSRecursiveLock.lock().

You might want to reduce the number of threads and listeners you create so you find the problem sooner. The impact on your users may be less or at least less frustrating.

A more interesting thing is what did the other threads show?

All of the threads were marked as state=WAITING (none were RUNNABLE). It seems that all of the threads were waiting for something and thus could not do anything but there was no Java deadlocks being thrown.

Yes, they were waiting for some thread to unlock the OSC.

Questions:

Does this indicate that we have an issue with multiple threads within the same JVM? We have ConcurrentRequest handling turned on.

No.

Should we investigate using multiple ObjectStoreCoordinators?

This might increase the time before the entire instance goes dead, but it won't address the root problem.

Is this a red herring and should I look elsewhere for the problem?

You have found the symptom, now you need to find the problem:

* app running out of memory and not unlocking the OSC

* your code locking the OSC and not unlocking it in a finally block

* bug in WO 5.4.1 (hypothetical, I don't know of on) locking the OSC and not unlocking it in a finally block

* deadlock or very long running transaction at the database level that is preventing and EOF operation from completing in a reasonable amount of time.

Chuck

Chuck Hill Senior Consultant / VP Development

Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems.

http://www.global-village.net/products/practical_webobjects

References:
	>NSRecursiveLock.lock() causing deadlock?? (From: Dov Rosenberg <email@hidden>)