Re: Deadlocks in one of our apps
Re: Deadlocks in one of our apps
- Subject: Re: Deadlocks in one of our apps
- From: Pascal Robert <email@hidden>
- Date: Tue, 1 Jun 2010 09:34:44 -0400
Hum... And after I started using ERXWOLongResponsePage, I still got a
deadlock, but this time, it says that it's a EODatabaseContext lock :
Thread t@92163: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.Object.wait() @bci=2, line=474 (Interpreted frame)
- com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72
(Interpreted frame)
- com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56,
line=1973 (Interpreted frame)
-
com
.webobjects
.eocontrol
.EOObjectStoreCoordinator
.addCooperatingObjectStore
(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130
(Interpreted frame)
-
com
.webobjects
.eoaccess
.EODatabaseChannel
.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext)
@bci=34, line=166 (Interpreted frame)
...
We don't "manual" (eg , in code) locking at the EODatabaseContext level.
Another thing to note if this is a long request to a database housed
in an ESX vm. We had similar problems with long requests timing out
between two systems, with one hosted by esx 4.x. Such long requests
were caught by some low level interface muxing issue and my whole
EOF stack was frozen when the underlying db connection was lost mid-
transaction. I resolved it by moving this application off of a vm.
On May 31, 2010, at 5:33 PM, Pascal Robert <email@hidden> wrote:
Ok, will try with ERXWOLongResponsePage since it look like it's
locking and unlocking all ECs in the thread.
There's a bunch of stuff wrong here. First, the only actually
locked thread is:
-
com
.webobjects
.eocontrol
.EOObjectStoreCoordinator
.addCooperatingObjectStore
(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5,
line=130 (Interpreted frame)
-
com
.webobjects
.eoaccess
.EODatabaseChannel
.setCurrentEditingContext
(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166
(Interpreted frame)
-
com
.webobjects
.eoaccess
.EODatabaseChannel
._selectWithFetchSpecificationEditingContext
(com.webobjects.eocontrol.EOFetchSpecification,
com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788
(Interpreted frame)
-
com
.webobjects
.eoaccess
.EODatabaseChannel
.selectObjectsWithFetchSpecification
(com.webobjects.eocontrol.EOFetchSpecification,
com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215
(Interpreted frame)
-
com
.webobjects
.eoaccess
.EODatabaseContext
._objectsWithFetchSpecificationEditingContext
(com.webobjects.eocontrol.EOFetchSpecification,
com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205
(Interpreted frame)
-
com
.webobjects
.eoaccess
.EODatabaseContext
.objectsWithFetchSpecification
(com.webobjects.eocontrol.EOFetchSpecification,
com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346
(Interpreted frame)
-
com
.webobjects
.eocontrol
.EOObjectStoreCoordinator
.objectsWithFetchSpecification
(com.webobjects.eocontrol.EOFetchSpecification,
com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539
(Interpreted frame)
-
com
.webobjects
.eocontrol
.EOEditingContext
.objectsWithFetchSpecification
(com.webobjects.eocontrol.EOFetchSpecification,
com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114
(Interpreted frame)
-
er
.extensions
.eof
.ERXEC
.objectsWithFetchSpecification
(com.webobjects.eocontrol.EOFetchSpecification,
com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211
(Interpreted frame)
-
com
.webobjects
.eocontrol
.EOEditingContext
.objectsWithFetchSpecification
(com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500
(Interpreted frame)
-
com
.acaiq
.fondation
.acaiqCore
._Licence.fetchLicences(com.webobjects.eocontrol.EOEditingContext,
com.webobjects.eocontrol.EOQualifier,
com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted
frame)
-
com
.acaiq
.fondation
.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier,
com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920
(Interpreted frame)
-
com
.acaiq
.fondation
.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier,
boolean) @bci=4, line=8893 (Interpreted frame)
-
com
.acaiq
.fondation
.acaiqCore
.Membre
.licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[])
@bci=100, line=980 (Interpreted frame)
- com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11,
line=996 (Interpreted frame)
- com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5,
line=1035 (Interpreted frame)
- sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object,
java.lang.Object[]) @bci=40 (Interpreted frame)
Which reminds me of an unlocked EC/OSC. Second:
java.lang.IllegalArgumentException: Attribute noCommandeOracle
can't receive a null parameter :
at
com
.acaiq
.fondation
.depot
.lbaArticle
._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419)
This is a *template* that throws on null?? You sure that's such a
bright idea? Isn't this what validation is for? And third:
at
com
.acaiq
.depot
.component.TransactionAchat.performAction(TransactionAchat.java:63)
at
com
.webobjects
.woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119)
As you're throwing from inside a normal
com.webobjects.woextensions.WOLongResponsePage, I seriously hope
you're doing your part of try{} finally{} and EC unlocking.
Cheers, Anjo
Am 31.05.2010 um 20:02 schrieb Pascal Robert:
One of our apps have deadlocked 5 times over 3 days, strangely
enough it started when we moved our Oracle Database 10gR2 DB to
our VMWare ESX 4.0 cluster. e didn't re-install Oracle, I simply
did a P2V (Physical to VM) conversion, so it's the exact same
version of Oracle DB as before.
What's happenning is that we store some information on our Oracle
database, save it, and we built a copy of some of the data to a
new EO (different entity) in a SQL Server 2005 db so the
accounting system take care of billing.
The exception that cause the deadlock (or at least the last thing
written to the log before the deadlock) :
java.lang.IllegalArgumentException: Attribute noCommandeOracle
can't receive a null parameter :
at
com
.acaiq
.fondation
.depot
.lbaArticle
._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419)
at com.acaiq.fondation.depot.Caissier.copiePourLBA(Caissier.java:
267)
at com.acaiq.fondation.depot.Caissier.paye(Caissier.java:137)
at
com
.acaiq
.depot
.component.TransactionAchat.performAction(TransactionAchat.java:63)
at
com
.webobjects
.woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119)
at java.lang.Thread.run(Thread.java:613)
And it happens here :
commandeEcom.setNoCommandeOracle(((Integer)
_commande.clefsPrimaire()));
_commande.clefsPrimaire() is a method that simply do :
return ERXEOControlUtilities.primaryKeyObjectForObject(this);
So clefsPrimaire() returns null, even if the data was stored in
the Oracle DB (and it's really there) and a primary key was
generated (EOF did it, it's not a "human generated" PK).
The whole block :
try {
_commande.setEstPaye(true);
_commande.editingContext().saveChanges(); // This is when we
save the EO in Oracle
} catch (Exception e) {
NSLog.err.appendln(e.getMessage());
remboursementPayflow
((Transaction)pfd.valueForKey("transaction"),_commande, (String)
pfd.valueForKey("PNREF"));
pfd = new NSDictionary<Object, String>(CODE_RESUTAT,
"REMBOURSEMENT");
} finally {
try {
copiePourLBA(_commande); // This is the method where we copy
some data to SQL Server
} catch (Exception e) {
NSLog.err.appendln(e.getMessage());
}
_commande.editingContext().saveChanges();
}
This problem didn't happen in the past (but we also had less
requests coming him) and it doesn't always happen :-/ jstack give
me tons of this :
Thread t@71683: (state = BLOCKED)
- java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=0,
line=382 (Interpreted frame)
- java.net.ServerSocket.implAccept(java.net.Socket) @bci=50,
line=450 (Interpreted frame)
- java.net.ServerSocket.accept() @bci=48, line=421 (Interpreted
frame)
- com.webobjects.appserver._private.WOWorkerThread.run() @bci=26,
line=238 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=613 (Interpreted frame)
The only non-blocked thread is :
Thread t@78083: (state = IN_NATIVE)
- java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl)
@bci=0 (Interpreted frame)
- java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=7,
line=384 (Interpreted frame)
- java.net.ServerSocket.implAccept(java.net.Socket) @bci=50,
line=450 (Interpreted frame)
- java.net.ServerSocket.accept() @bci=48, line=421 (Interpreted
frame)
- com.webobjects.appserver._private.WOWorkerThread.run() @bci=26,
line=238 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=613 (Interpreted frame)
Beside going back to our Oracle physical server, I have no idea
of why I'm getting this. Java 1.5 on OS X 10.4.11 Server, WO
5.3.3. W
<pid23991.txt>
----
Pascal Robert
email@hidden
AIM: MacTICanada
Twitter : MacTICanada
LinkedIn : http://www.linkedin.com/in/macti
WO Community profile : http://wocommunity.org/page/member?name=probert
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden