• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Deadlocks in one of our apps
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Deadlocks in one of our apps


  • Subject: Re: Deadlocks in one of our apps
  • From: Chuck Hill <email@hidden>
  • Date: Wed, 2 Jun 2010 07:30:08 -0700

That makes your code look guilty then. :-) Check your long response page implementation again. Are there any exceptions in the log that might be related?

I'd also reduce the Maximum Adaptor threads (JavaMonitor -> Application configuration -> Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible.

Chuck



On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote:

... And going back to the physical server didn't solve anything, I got the same deadlock this morning.

Ok, so I will move back the DB to the physical server to see if the problem goes away.


On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote:

Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock :

Thread t@92163: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.Object.wait() @bci=2, line=474 (Interpreted frame)
- com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame)
- com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame)
- com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame)
- com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame)
...


We don't "manual" (eg , in code) locking at the EODatabaseContext level.

It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you.


Chuck


Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm.



On May 31, 2010, at 5:33 PM, Pascal Robert <email@hidden> wrote:


Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread.

There's a bunch of stuff wrong here. First, the only actually locked thread is:

- com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame)
- com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame)
- com .webobjects .eoaccess .EODatabaseChannel ._selectWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame)
- com .webobjects .eoaccess .EODatabaseChannel .selectObjectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame)
- com .webobjects .eoaccess .EODatabaseContext ._objectsWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame)
- com .webobjects .eoaccess .EODatabaseContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame)
- com .webobjects .eocontrol .EOObjectStoreCoordinator .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame)
- com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame)
- er .extensions .eof .ERXEC .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame)
- com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame)
- com .acaiq .fondation .acaiqCore ._Licence .fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame)
- com .acaiq .fondation .acaiqCore ._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame)
- com .acaiq .fondation .acaiqCore ._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame)
- com .acaiq .fondation .acaiqCore .Membre .licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame)
- com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame)
- com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 (Interpreted frame)
- sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Interpreted frame)


Which reminds me of an unlocked EC/OSC. Second:

java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter :
at com .acaiq .fondation .depot .lbaArticle ._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419)

This is a *template* that throws on null?? You sure that's such a bright idea? Isn't this what validation is for? And third:


at com .acaiq .depot .component .TransactionAchat.performAction(TransactionAchat.java:63)
at com .webobjects .woextensions.WOLongResponsePage.run(WOLongResponsePage.java: 119)

As you're throwing from inside a normal com.webobjects.woextensions.WOLongResponsePage, I seriously hope you're doing your part of try{} finally{} and EC unlocking.



Cheers, Anjo



Am 31.05.2010 um 20:02 schrieb Pascal Robert:

One of our apps have deadlocked 5 times over 3 days, strangely enough it started when we moved our Oracle Database 10gR2 DB to our VMWare ESX 4.0 cluster. e didn't re-install Oracle, I simply did a P2V (Physical to VM) conversion, so it's the exact same version of Oracle DB as before.

What's happenning is that we store some information on our Oracle database, save it, and we built a copy of some of the data to a new EO (different entity) in a SQL Server 2005 db so the accounting system take care of billing.

The exception that cause the deadlock (or at least the last thing written to the log before the deadlock) :

java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter :
at com .acaiq .fondation .depot .lbaArticle ._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419)
at com.acaiq.fondation.depot.Caissier.copiePourLBA(Caissier.java: 267)
at com.acaiq.fondation.depot.Caissier.paye(Caissier.java:137)
at com .acaiq .depot .component .TransactionAchat.performAction(TransactionAchat.java:63)
at com .webobjects .woextensions.WOLongResponsePage.run(WOLongResponsePage.java: 119)
at java.lang.Thread.run(Thread.java:613)


And it happens here :

commandeEcom.setNoCommandeOracle(((Integer) _commande.clefsPrimaire()));

_commande.clefsPrimaire() is a method that simply do :

	return ERXEOControlUtilities.primaryKeyObjectForObject(this);

So clefsPrimaire() returns null, even if the data was stored in the Oracle DB (and it's really there) and a primary key was generated (EOF did it, it's not a "human generated" PK).

The whole block :

try {
_commande.setEstPaye(true);
_commande.editingContext().saveChanges(); // This is when we save the EO in Oracle
} catch (Exception e) {
NSLog.err.appendln(e.getMessage());
remboursementPayflow ((Transaction)pfd.valueForKey("transaction"),_commande, (String) pfd.valueForKey("PNREF"));
pfd = new NSDictionary<Object, String>(CODE_RESUTAT, "REMBOURSEMENT");
} finally {
try {
copiePourLBA(_commande); // This is the method where we copy some data to SQL Server
} catch (Exception e) {
NSLog.err.appendln(e.getMessage());
}
_commande.editingContext().saveChanges();
}


This problem didn't happen in the past (but we also had less requests coming him) and it doesn't always happen :-/ jstack give me tons of this :

Thread t@71683: (state = BLOCKED)
- java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=0, line=382 (Interpreted frame)
- java.net.ServerSocket.implAccept(java.net.Socket) @bci=50, line=450 (Interpreted frame)
- java.net.ServerSocket.accept() @bci=48, line=421 (Interpreted frame)
- com.webobjects.appserver._private.WOWorkerThread.run() @bci=26, line=238 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=613 (Interpreted frame)


The only non-blocked thread is :

Thread t@78083: (state = IN_NATIVE)
- java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=0 (Interpreted frame)
- java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=7, line=384 (Interpreted frame)
- java.net.ServerSocket.implAccept(java.net.Socket) @bci=50, line=450 (Interpreted frame)
- java.net.ServerSocket.accept() @bci=48, line=421 (Interpreted frame)
- com.webobjects.appserver._private.WOWorkerThread.run() @bci=26, line=238 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=613 (Interpreted frame)


Beside going back to our Oracle physical server, I have no idea of why I'm getting this. Java 1.5 on OS X 10.4.11 Server, WO 5.3.3. W

<pid23991.txt>


---- Pascal Robert email@hidden

AIM: MacTICanada
Twitter : MacTICanada
LinkedIn : http://www.linkedin.com/in/macti
WO Community profile : http://wocommunity.org/page/member?name=probert

-- Chuck Hill Senior Consultant / VP Development

Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems.
http://www.global-village.net/products/practical_webobjects








_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: Deadlocks in one of our apps
      • From: Pascal Robert <email@hidden>
References: 
 >Re: Deadlocks in one of our apps (From: Pascal Robert <email@hidden>)
 >Re: Deadlocks in one of our apps (From: Chuck Hill <email@hidden>)
 >Re: Deadlocks in one of our apps (From: Pascal Robert <email@hidden>)
 >Re: Deadlocks in one of our apps (From: Pascal Robert <email@hidden>)

  • Prev by Date: Re: EOQualifier to qualify NSTimestamp by month
  • Next by Date: Re: Deadlocks in one of our apps
  • Previous by thread: Re: Deadlocks in one of our apps
  • Next by thread: Re: Deadlocks in one of our apps
  • Index(es):
    • Date
    • Thread