Help! RE: Deadlock on Apache 2.2 Adaptor under high load
Help! RE: Deadlock on Apache 2.2 Adaptor under high load
- Subject: Help! RE: Deadlock on Apache 2.2 Adaptor under high load
- From: "Brook, James" <email@hidden>
- Date: Fri, 29 Jun 2012 16:35:28 +0000
- Thread-topic: Help! RE: Deadlock on Apache 2.2 Adaptor under high load
It's probably bad form to keep answering my own mails but no-one had anything to say about this. Are there still people on the list who are familiar with the adaptor internals? This problem is causing us a lot of pain in production.
Does anyone use the MPM worker module with Apache or are we all still with pre-fork? I don't think we could live without the performance gains. Perhaps it doesn't matter.
I haven't quite proven this but I am pretty certain that my problem is with fcntl. That's what the adaptor uses to lock the shared memory file. It's apparently an outdated way of doing this - APR now has better abstractions for these sorts of mutexes. Even the code that does the locking is in a retry loop with up to 50 attempts! I started trying to rewrite the locking stuff but I am out of my depth.
It strikes me that in general this would not be a bad bit of code for the community to have updated. Can anyone help me with that please?
James
________________________________________
From: Brook, James
Sent: 13 June 2012 18:48
To: <email@hidden>
Subject: Re: Deadlock on Apache 2.2 Adaptor under high load - Solaris 10 - Worker MPM
Now I have some detailed adaptor logging from a time close to the deadlock. Here is an example of an error with a lock:
Debug: thread 37 locking WOShmem_lock from ../Adaptor/shmem.c:375
Debug: thread 37 unlocking WOShmem_lock from ../Adaptor/shmem.c:379
Error: lock_file_section(): failed to lock (1 attempts): Deadlock situation detected/avoided
Debug: thread 37 locking str_lock from ../Adaptor/wastring.c:93
Debug: thread 37 unlocking str_lock from ../Adaptor/wastring.c:100
Debug: thread 37 locking str_lock from ../Adaptor/wastring.c:152
Debug: thread 37 unlocking str_lock from ../Adaptor/wastring.c:158
Debug: thread 37 locking WOShmem_lock from ../Adaptor/shmem.c:391
Debug: thread 37 unlocking WOShmem_lock from ../Adaptor/shmem.c:394
Error: ac_readConfiguration: WOShmem_lock() failed. Skipping reading config.
On Jun 13, 2012, at 5:30 PM, James Brook wrote:
> We have a big problem with the Apache 2.2 WebObjects adaptor on our Solaris 10 web servers. We are using the 'worker' MPM but when the sites get busy nearly every Apache thread is waiting for a shared memory lock to call the function that reads the adaptor config. The remaining threads are in the fcntl function trying to lock a section of shared memory. See below for a couple of example thread stacks.
>
> I read in several posts that fcntl on Solaris 10 causes deadlocks under high load and that the problem is worse with the 'worker MPM'. The recommend locking mechanism for Solaris seems to be to use pthreads.
>
> I know that at least a few list members are running with the Solaris adaptor. My questions:
> * Has anyone experienced this problem and found a solution?
> * Anyone using the 'worker' MPM or do people still use pre-fork (I don't think this a thread safety problem).
> * Any help or suggestions? Especially, any tips on rewriting to use pthreads?
>
> --
> James
>
>
> feec5638 fcntl (d8, 7, 2abe588)
> feeb8258 fcntl (d8, 1, fefcc200, 4d6880, 1580, 20a58) + 84
> febe8570 lock_file_section (d8, 4d6880, 14, 2abe588, 147c, 2) + 58
> febe8e14 WOShmem_lock (2abe588, 14, 1, 4d6880, 1580, 1400) + d4
> febef410 ac_readConfiguration (1, fffee980, 11400, fec08f74, 1d84, 1c00) + 40
> febe71cc _runRequest (fc9fb9c4, 0, 2d77168, 2d18b40, 5, 0) + 260
> febe6a0c tr_handleRequest (2d18b40, 27226f0, fc9fbc50, 0, 5, 2) + 30c
> febf42a8 WebObjects_handler (2721208, 0, 10000, 0, 2d18b40, fec08f74) + 48c
> 00041484 ap_run_handler (2721208, febf3e1c, 7b578, 6b5a10, 2, 8) + 40
> 00041ab4 ap_invoke_handler (2721208, 0, 2721208, 0, 6b58bc, 79c00) + ec
> 0005132c ap_process_request (2721208, 79400, 4, 1, 0, 2721208) + 54
> 0004d9a4 ap_process_http_connection (26b61c0, 7c000, 0, 1, 79548, 5) + 78
> 00049654 ap_process_connection (26b61c0, 26b5f10, 6b5d90, 0, 7bd98, 6b5d78) + d4
> 00057558 worker_thread (14d888, ad7, fc9fbf98, 7c24c, 2b, 17) + 280
> feec5238 _lwp_start (0, 0, 0, 0, 0, 0)
>
>
> feec52d8 lwp_park (0, 0, 0)
> feebf350 cond_wait_queue (ef50a8, ef5090, 0, 0, 1c00, 0) + 28
> feebf874 cond_wait (ef50a8, ef5090, ef50a8, 0, fec0a8f8, 3) + 10
> feebf8b0 pthread_cond_wait (ef50a8, ef5090, ef5090, 0, 1c00, 3a) + 8
> febf2730 _WA_lock (ef5088, febf5974, ef50a8, 0, fec0a8f8, 3) + 90
> febe9494 sha_lock (100, 4, fffeca64, fec08f74, ef3230, 13400) + 5c
> febedd84 ac_findApplication (fe0fb54c, 4, fec0acfc, fec08f74, 0, fec0a474) + 70
> febe6794 tr_handleRequest (2402c38, 30bbec0, fe0fb7d8, 798f0, ffffffff, 14400) + 94
> febf42a8 WebObjects_handler (30baf40, 0, 10000, 0, 2402c38, fec08f74) + 48c
> 00041484 ap_run_handler (30baf40, febf3e1c, 7b578, 6b5a10, 2, 8) + 40
> 00041ab4 ap_invoke_handler (30baf40, 0, 2ba5f10, 2ba5348, 30baf40, 2b824d8) + ec
> 0003f080 ap_run_sub_req (ffffffff, 30bb0e8, 20, 0, 30bc370, 30baf40) + 3c
> fed336d8 handle_include (2ba4d20, 10800, 2ba5f10, 2ba5348, 30baf40, 2b824d8) + 334
> fed378f8 send_parsed_content (11a8, 7c021, 2ba4d20, 2c01898, 2ba5f14, 2ba5f10) + 1080
> 0003afb0 default_handler (0, 2c01898, 2b91e10, 2b7c748, 2b7e598, 2ba5328) + 4a8
> 00041484 ap_run_handler (2c01898, 3ab08, 7b578, 6b5a74, 7, 8) + 40
> 00041ab4 ap_invoke_handler (2c01898, 0, 2c01898, 2b9eb80, ffb1b6a0, 4e4960) + ec
> 00051a58 ap_internal_redirect (0, 2c01898, fe0fbd10, fe0fbcac, 1, 2c01898) + 44
> febab53c handler_redirect (2b9eb80, ffffffff, febbd238, 2c01560, fffefd64, 10000) + 90
> 00041484 ap_run_handler (2b9eb80, febab4ac, 7b578, 6b5a4c, 5, 8) + 40
> 00041ab4 ap_invoke_handler (2b9eb80, 0, 2b9eb80, 0, 6b58bc, 79c00) + ec
> 0005132c ap_process_request (2b9eb80, 79400, 4, 1, 0, 2b9eb80) + 54
> 0004d9a4 ap_process_http_connection (2b7c748, 7c000, 0, 1, 79548, 5) + 78
> 00049654 ap_process_connection (2b7c748, 2b7c498, 6b5d90, 0, 7bd98, 6b5d78) + d4
> 00057558 worker_thread (14d5a8, a00, fe0fbf98, 7c24c, 28, 0) + 280
> feec5238 _lwp_start (0, 0, 0, 0, 0, 0)
>
>
>
>
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden