Re: background tasks locked, workers run all right
Re: background tasks locked, workers run all right
- Subject: Re: background tasks locked, workers run all right
- From: Aaron Rosenzweig via Webobjects-dev <email@hidden>
- Date: Thu, 21 May 2020 12:43:02 -0400
Hi OC,
Given that you like the idea of making a new coordinator / cursor for each of
these tasks - my recommendation is to do it but then don’t lock the EC. Nothing
else is using it right? You’ve guaranteed it by starting a new thread and a new
coordinator, so don’t lock it. You can get into a pickle with locking if not
done right and there is no need in this use case.
You can do that and be happy :-) But to answer your other questions.
Creating a cursor to the DB is not cheap. It has a cost. If you want a simple
high performance system you should have a pool of them that are maintained and
ready to go. You might say, for your particular case, the cost isn’t so bad and
you don’t care, which is fine, but what about other use cases? You have an
import that runs sometimes… ok, what about an online game that has to deposit
gold coins? It’s not a big task but it takes time and you want it to go as fast
as possible, it’s a different use case.
I’ve seen systems that let DB cursors go out of control. You start running into
unix file usage limits. Every DB cursor can become a file pipe on the file
system. You have to start raising ulimits and if you have oracle you have to
buy a bigger concurrent license. Sometimes you don’t need 100% concurrency but
you value having 20 concurrent threads that consume what they can from a queue
of 10k requests. Does that email merge and send have to be done “now” or can it
be done within 2 hours? Probably it can be.
I like Kieran’s system because it gives you a big bang for your buck where it
solves a lot of problems without you thinking too hard. The main gotcha is that
it is round robin - which means it can hand the same coordinator / cursor to
two threads at once at times. There you need to use locking to prevent EOF
confusion and then you are in a bit of a pickle but… it was a quick and easy
thing to implement and gives you concurrency 95% of the time.
AARON ROSENZWEIG / Chat 'n Bike <http://www.chatnbike.com/>
e: email@hidden <mailto:email@hidden> t: (301) 956-2319
> On May 21, 2020, at 12:29 PM, OCsite <email@hidden> wrote:
>
> Aaron,
>
> thanks a lot!
>
>> On 21 May 2020, at 17:30, Aaron Rosenzweig <email@hidden
>> <mailto:email@hidden>> wrote:
>> It looks like you are mixing a couple things.
>
> Well probably I miss something of importance here.
>
>> If you are going to use Kieran’s code (I believe it was him initially) you
>> should use it fully and have a healthy number of coordinators
>
> If “the Kieran’s code” is exploiting of
> ERXObjectStoreCoordinatorPool.maxCoordinators > 1, well, we actually do not
> want to do that. We do that in other applications, but not in this one.
>
> Part of the normal processing which we do is locking the (one and only one)
> shared OSC to make sure some of the actions are well-serialised — it's more
> important than the speed. For normal processing (of page actions through R/R
> loops and their working threads) this causes no problem, quite the contrary,
> it's precisely what we want.
>
> Contrariwise, those import tasks are special for they (a) run with very low
> priority, (b) should not be affected by the main OSC lock (nor affect it and
> its users) at all. Thus...
>
>> I say that because his code does round robin and there is a chance that two
>> threads get the same coordinator under heavy load so you better stock up on
>> them. Think of each coordinator as a single cursor to the DB.
>
> ... creating a new coordinator for each import thread as “its own cursor to
> the DB” (or more precisely, its own EOF stack pretty independent on all the
> others) is precisely what I want.
>
> Actually those import tasks could be ideally run from another application or
> at least instance; only it would be a bit at the inconvenient side :)
>
> Is that wrong? Why?
>
>> For some reason you are telling Kieran’s code to only have one coordinator.
>> Then, on top of that, you make your own coordinator each time. It’s like
>> crossing the beams in Ghostbusters. Go one way, or the other, don’t mix.
>
> Probably I misunderstood the Kieran's code mightily.
>
> I do not need a coordinator pool far as I can say, and but for the default
> one OSC (which is shared by all the normal processing threads and ECs), I do
> not want coordinators reused, ever.
>
> Normally, there's at most one import task at given time (and they happen to
> be rather infrequent); nevertheless, if due to some extremely rare conditions
> it so happens and more imports runs at once, I'd still prefer each of them to
> have its own EOF stack/DB access, independent on all the others.
>
>> There is something to be said about rolling your own and ensuring every
>> thread has their own coordinator and then you don’t even need to lock.
>
> Yup, far as I understand, in this case, when the thread has its own EC in its
> own OSC none of which is ever shared with other threads, I believe locking in
> superfluous (but at the same time, harmless).
>
>> You would also want a queuing mechanism so that you don’t create too many
>> coordinators.
>
> Aside of the memory consumption (which we keep track of and is not too bad),
> why? In theory I can see the DB might reject too many connections, but that
> would hardly happen. Even if it did, I presume the fetch would thrown, not
> block?!?
>
>> If you are are not going to do that and want the easy button, embrace
>> Kieran’s code fully by making a large pool and letting his code give you an
>> EC to work with.
>
> Correct me please if I am wrong, but far as I understand, this could lead to
> the import EC sharing the OSC with some of normal sessions/worker threads,
> and that's definitely something we do not want to.
>
> I can add extra code to make sure it does not happen and to make sure
> sessions get one OSC and threads get others, but what would then be the
> advantage of that compared with creating one OSC for each import thread?
>
>> Think about how many people you expect to be using a long running task
>> concurrently and at least double that number for your pool size.
>
> In which case, what's precisely the advantage of having a pool instead of
> creating a new OSC for each import thread (and releasing it when the import
> is done)?
>
>> Don’t forget to send him some Red Breast Whiskey.
>
> Aha! At least one thing which I do not need more detailed elucidation :)
>
> Thanks again,
> OC
>
>>> On May 21, 2020, at 7:13 AM, OCsite via Webobjects-dev
>>> <email@hidden <mailto:email@hidden>>
>>> wrote:
>>> bumped lately into another weird problem. We import some data into DB in
>>> background tasks. Up to yesterday, it worked normally; today six import
>>> tasks were launched, and each of them seemingly hang in its first DB
>>> operation. Restart did help; alas, the site admins did not try to ask JVM
>>> to detect deadlocks when they restarted the application.
>>>
>>> The background task looks like this:
>>>
>>> ===
>>> class ImportCSVTask extends ERXLongResponseTask.DefaultImplementation {
>>> def performAction {
>>> _thread.priority=Thread.MIN_PRIORITY
>>> try {
>>> try {
>>> editingContext=ERXEC.newEditingContext(objectStore=new
>>> EOObjectStoreCoordinator())
>>> editingContext.lock()
>>> lognow 1, "=== preparing CSV import in EC $editingContext
>>> ==="
>>>
>>> formPrototype=ERXEOGlobalIDUtilities.fetchObjectWithGlobalID(editingContext,formPrototypeGID)
>>> lognow 1, "=== local prototype $formPrototype ==="
>>> ... ...
>>> ===
>>>
>>> Always the “preparing” log was the last thing those threads presented; none
>>> of them ever reported “local prototype”. There's no other related log in
>>> there.
>>>
>>> Meantime the application ran normally and the worker tasks communicated
>>> with the database all right (with an occasional report that some select
>>> took 70-odd ms from ERXAdaptorChannelDelegate, we have the threshold at
>>> 50). We run with ERXObjectStoreCoordinatorPool.maxCoordinators=1.
>>>
>>> Any idea what could have gone wrong and how to find the culprit and prevent
>>> the problem in future? I thought a new EC in a new OSC can't be blocked for
>>> long, but self-evidently, I was wrong, they seemed to lock indefinitely
>>> (application was restarted ten-odd hours after the first import hanged
>>> after its “preparing” report, still no “local prototype”).
>>>
>>> Thanks and all the best,
>>> OC
>>>
>>> _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Webobjects-dev mailing list (email@hidden
>>> <mailto:email@hidden>)
>>> Help/Unsubscribe/Update your Subscription:
>>>
>>>
>>> This email sent to email@hidden <mailto:email@hidden>
>>
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden