Re: Help! RE: Deadlock on Apache 2.2 Adaptor under high load
Re: Help! RE: Deadlock on Apache 2.2 Adaptor under high load
- Subject: Re: Help! RE: Deadlock on Apache 2.2 Adaptor under high load
- From: Michael Kondratov <email@hidden>
- Date: Thu, 05 Jul 2012 13:30:50 -0400
I was getting apache seg-faults with a load of around 80 - 100 requests per second. Moved to prefork to fix it.
Michael
On Jul 5, 2012, at 4:41 AM, "Brook, James" <email@hidden> wrote:
> Chuck, thanks for the information. The skill set is a bit of a rare one.
>
> Michael, interesting to hear about a deadlock on Linux. Do you mean you were using the 'worker' MPM? You don't have a stack trace by any chance? We may have hit the same bottleneck or deadlock. I have been wondering whether this is a sparc/Solaris specific issue.
>
> Not being able to use the worker MPM with WO seems like a serious scalability issue for anyone who isn't 'akamazed'. If I can narrow the problem down to something specific enough we may look to pay someone to fix this.
>
> Sent from my iPhone
>
> On 5 Jul 2012, at 00:10, "Chuck Hill" <email@hidden> wrote:
>
>> Hi James,
>>
>>
>> On 2012-06-29, at 9:35 AM, Brook, James wrote:
>>
>>> It's probably bad form to keep answering my own mails but no-one had anything to say about this. Are there still people on the list who are familiar with the adaptor internals? This problem is causing us a lot of pain in production.
>>
>> At this point in time, you are probably the world's authority on this.
>>
>>
>>> Does anyone use the MPM worker module with Apache or are we all still with pre-fork? I don't think we could live without the performance gains. Perhaps it doesn't matter.
>>
>> I would guess that very few of us are using Apache on Solaris.
>>
>>
>>> I haven't quite proven this but I am pretty certain that my problem is with fcntl. That's what the adaptor uses to lock the shared memory file. It's apparently an outdated way of doing this - APR now has better abstractions for these sorts of mutexes. Even the code that does the locking is in a retry loop with up to 50 attempts! I started trying to rewrite the locking stuff but I am out of my depth.
>>
>> There are probably a few people here with current C skills, I am not one of them. And then you probably need Apache and Solaris API knowledge too.
>>
>>
>>> It strikes me that in general this would not be a bad bit of code for the community to have updated. Can anyone help me with that please?
>>
>> I would. but I can't. I was trying to help one company that had a deployment problem on Solaris that sounds somewhat similar to yours. So yes, it would be good to get this updated. But finding someone else with the skill set is unlikely.
>>
>>
>> Chuck
>>
>>
>>> ________________________________________
>>> From: Brook, James
>>> Sent: 13 June 2012 18:48
>>> To: <email@hidden>
>>> Subject: Re: Deadlock on Apache 2.2 Adaptor under high load - Solaris 10 - Worker MPM
>>>
>>> Now I have some detailed adaptor logging from a time close to the deadlock. Here is an example of an error with a lock:
>>>
>>> Debug: thread 37 locking WOShmem_lock from ../Adaptor/shmem.c:375
>>> Debug: thread 37 unlocking WOShmem_lock from ../Adaptor/shmem.c:379
>>> Error: lock_file_section(): failed to lock (1 attempts): Deadlock situation detected/avoided
>>> Debug: thread 37 locking str_lock from ../Adaptor/wastring.c:93
>>> Debug: thread 37 unlocking str_lock from ../Adaptor/wastring.c:100
>>> Debug: thread 37 locking str_lock from ../Adaptor/wastring.c:152
>>> Debug: thread 37 unlocking str_lock from ../Adaptor/wastring.c:158
>>> Debug: thread 37 locking WOShmem_lock from ../Adaptor/shmem.c:391
>>> Debug: thread 37 unlocking WOShmem_lock from ../Adaptor/shmem.c:394
>>> Error: ac_readConfiguration: WOShmem_lock() failed. Skipping reading config.
>>>
>>> On Jun 13, 2012, at 5:30 PM, James Brook wrote:
>>>
>>>> We have a big problem with the Apache 2.2 WebObjects adaptor on our Solaris 10 web servers. We are using the 'worker' MPM but when the sites get busy nearly every Apache thread is waiting for a shared memory lock to call the function that reads the adaptor config. The remaining threads are in the fcntl function trying to lock a section of shared memory. See below for a couple of example thread stacks.
>>>>
>>>> I read in several posts that fcntl on Solaris 10 causes deadlocks under high load and that the problem is worse with the 'worker MPM'. The recommend locking mechanism for Solaris seems to be to use pthreads.
>>>>
>>>> I know that at least a few list members are running with the Solaris adaptor. My questions:
>>>> * Has anyone experienced this problem and found a solution?
>>>> * Anyone using the 'worker' MPM or do people still use pre-fork (I don't think this a thread safety problem).
>>>> * Any help or suggestions? Especially, any tips on rewriting to use pthreads?
>>>>
>>>> --
>>>> James
>>>>
>>>>
>>>> feec5638 fcntl (d8, 7, 2abe588)
>>>> feeb8258 fcntl (d8, 1, fefcc200, 4d6880, 1580, 20a58) + 84
>>>> febe8570 lock_file_section (d8, 4d6880, 14, 2abe588, 147c, 2) + 58
>>>> febe8e14 WOShmem_lock (2abe588, 14, 1, 4d6880, 1580, 1400) + d4
>>>> febef410 ac_readConfiguration (1, fffee980, 11400, fec08f74, 1d84, 1c00) + 40
>>>> febe71cc _runRequest (fc9fb9c4, 0, 2d77168, 2d18b40, 5, 0) + 260
>>>> febe6a0c tr_handleRequest (2d18b40, 27226f0, fc9fbc50, 0, 5, 2) + 30c
>>>> febf42a8 WebObjects_handler (2721208, 0, 10000, 0, 2d18b40, fec08f74) + 48c
>>>> 00041484 ap_run_handler (2721208, febf3e1c, 7b578, 6b5a10, 2, 8) + 40
>>>> 00041ab4 ap_invoke_handler (2721208, 0, 2721208, 0, 6b58bc, 79c00) + ec
>>>> 0005132c ap_process_request (2721208, 79400, 4, 1, 0, 2721208) + 54
>>>> 0004d9a4 ap_process_http_connection (26b61c0, 7c000, 0, 1, 79548, 5) + 78
>>>> 00049654 ap_process_connection (26b61c0, 26b5f10, 6b5d90, 0, 7bd98, 6b5d78) + d4
>>>> 00057558 worker_thread (14d888, ad7, fc9fbf98, 7c24c, 2b, 17) + 280
>>>> feec5238 _lwp_start (0, 0, 0, 0, 0, 0)
>>>>
>>>>
>>>> feec52d8 lwp_park (0, 0, 0)
>>>> feebf350 cond_wait_queue (ef50a8, ef5090, 0, 0, 1c00, 0) + 28
>>>> feebf874 cond_wait (ef50a8, ef5090, ef50a8, 0, fec0a8f8, 3) + 10
>>>> feebf8b0 pthread_cond_wait (ef50a8, ef5090, ef5090, 0, 1c00, 3a) + 8
>>>> febf2730 _WA_lock (ef5088, febf5974, ef50a8, 0, fec0a8f8, 3) + 90
>>>> febe9494 sha_lock (100, 4, fffeca64, fec08f74, ef3230, 13400) + 5c
>>>> febedd84 ac_findApplication (fe0fb54c, 4, fec0acfc, fec08f74, 0, fec0a474) + 70
>>>> febe6794 tr_handleRequest (2402c38, 30bbec0, fe0fb7d8, 798f0, ffffffff, 14400) + 94
>>>> febf42a8 WebObjects_handler (30baf40, 0, 10000, 0, 2402c38, fec08f74) + 48c
>>>> 00041484 ap_run_handler (30baf40, febf3e1c, 7b578, 6b5a10, 2, 8) + 40
>>>> 00041ab4 ap_invoke_handler (30baf40, 0, 2ba5f10, 2ba5348, 30baf40, 2b824d8) + ec
>>>> 0003f080 ap_run_sub_req (ffffffff, 30bb0e8, 20, 0, 30bc370, 30baf40) + 3c
>>>> fed336d8 handle_include (2ba4d20, 10800, 2ba5f10, 2ba5348, 30baf40, 2b824d8) + 334
>>>> fed378f8 send_parsed_content (11a8, 7c021, 2ba4d20, 2c01898, 2ba5f14, 2ba5f10) + 1080
>>>> 0003afb0 default_handler (0, 2c01898, 2b91e10, 2b7c748, 2b7e598, 2ba5328) + 4a8
>>>> 00041484 ap_run_handler (2c01898, 3ab08, 7b578, 6b5a74, 7, 8) + 40
>>>> 00041ab4 ap_invoke_handler (2c01898, 0, 2c01898, 2b9eb80, ffb1b6a0, 4e4960) + ec
>>>> 00051a58 ap_internal_redirect (0, 2c01898, fe0fbd10, fe0fbcac, 1, 2c01898) + 44
>>>> febab53c handler_redirect (2b9eb80, ffffffff, febbd238, 2c01560, fffefd64, 10000) + 90
>>>> 00041484 ap_run_handler (2b9eb80, febab4ac, 7b578, 6b5a4c, 5, 8) + 40
>>>> 00041ab4 ap_invoke_handler (2b9eb80, 0, 2b9eb80, 0, 6b58bc, 79c00) + ec
>>>> 0005132c ap_process_request (2b9eb80, 79400, 4, 1, 0, 2b9eb80) + 54
>>>> 0004d9a4 ap_process_http_connection (2b7c748, 7c000, 0, 1, 79548, 5) + 78
>>>> 00049654 ap_process_connection (2b7c748, 2b7c498, 6b5d90, 0, 7bd98, 6b5d78) + d4
>>>> 00057558 worker_thread (14d5a8, a00, fe0fbf98, 7c24c, 28, 0) + 280
>>>> feec5238 _lwp_start (0, 0, 0, 0, 0, 0)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Webobjects-dev mailing list (email@hidden)
>>> Help/Unsubscribe/Update your Subscription:
>>>
>>> This email sent to email@hidden
>>
>> --
>> Chuck Hill Senior Consultant / VP Development
>>
>> Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems.
>> http://www.global-village.net/gvc/practical_webobjects
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Webobjects-dev mailing list (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>>
>> This email sent to email@hidden
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden