Re: Lots of EOs slow down the performance
Re: Lots of EOs slow down the performance
- Subject: Re: Lots of EOs slow down the performance
- From: Guido Neitzer <email@hidden>
- Date: Thu, 6 Nov 2008 15:05:24 -0700
On 06.11.2008, at 14:33, Yung-Luen Lan wrote:
Do you have any trick or pattern to recycle EC? Call System.gc()?
Work in batches, call ec.dispose(), set ec = null, create a new one.
What do you expect? It has to create 150k insert statements. One
for each
object, then copy these to the JDBC driver, that one executes them
one by
one on the database ... and so on. It's just a plain inefficient
way of
creating rows in the table.
I'm not sure about this. Maybe group those SQL statements into one
transaction could help?
(Or is that already been done with EOF?)
That is already done.
Again: you are running out of memory. Read the error message.
First of all: EOF is not build for bulk operations. If you want to do
something like that, you need to find other ways. What I prefer to
do for
something like that:
- use ERXFetchSpecificationBatchIterator and iterate over the
objects in
small batches of 100 or 200 rows
- create the CVS file on disk with an output stream that doesn't
keep the
whole thing in memory
- deliver the file when the operation is done
Yeah, those are actually what I do taking your suggestions--except the
first one. I'm still learning what is a fetch specification. Thanks
again.
Oh, I see. If you iterate over hundreds of thousands of objects, you
need to clear out the ec (see above).
Think about what is going on in that case. You have a 150000
objects in Java
land, you might have relationships, you create string
representations for
each and every import statement, maybe more than one string per
statements -
remember String is immutable and the GC always comes too late -,
you pass
that to the JDBC adaptor as one transaction so that one keeps it's
own copy
of the statements (not sure, but likely if you expect the worst
case), and
so on. If each of your objects is around 1k of size, you 150000k or
around
146MB and that is just to keep the objects around. What did you say
how much
memory do you give your Java apps?
I totally agree your point. Let me break this into two parts--space
and time.
Memory: it appears that EC isn't suited for insert a lot EO into
database because the memory footstep of EC work like this: hold
objects in memory, discard or save them to db at once. Other ORM tool
like activerecord or python.db seems don't have the concept of EC;
Yes. And it's not a thing of the editing context that the memory runs
out but a thing of Java, your memory settings for the app and so on.
But really, bulk operations need some special handling in any decent
persistance framework.
Performance: I did some benchmark on my database. 150,000 insertion on
the same table:
Raw SQL, transaction: 23s
Raw SQL, no transaction: 154s
EC, saveChanges every 1000 EO inserted: 273s
Comparing raw SQL without transaction, EC method is not bad at all.
(only 1.7x slower) I don't care about to reduce the wait time from
five minutes to half. Totally acceptable. :-)
Yeah, I thought so too. It's decent. Depending on the database
structure that is. If you need real performance, you need to go to
database specific things.
What we do in some cases is:
1. Create file with copy statements that use the PostgreSQL copy
command into a temp table
2. Use "insert into the_table select * from the_temp_table ..."
This performs way faster than anything else on PostgreSQL. It just
depends what you need there.
Ah, if my previous post offense people, I apologize. What I really
mean is obsolutely not "why webobject is bad and old", but "This
should be done easily in 21 century. I must do something wrong. What's
the correct way?"
Don't do it that way. ;-)
The thing is, that EOF keeps an object graph in memory. If you just
want to dump rows into a table, get rid of the object graph if you
don't need it. That's where the memory and time overhead goes.
There is one thing to learn from here: do not expect EOF to be fast
automatically if you are dealing with hundreds of thousands or
millions of objects. Whenever I expect something to exceed a couple
hundred objects, I create batches, do batch fetching / faulting,
recycle editing contexts if I have to, or go to the low level. It's
just something to be aware of. In many places unfortunately and you'll
probably find more over time.
There are many many tricks you can and should use when doing bulk
operations with EOF. It's something where you need to dig into, maybe
ask here, watch the generated SQL in the logfile and so on. It's a big
topic.
Personally I never found EOF to be particularly slow, one complaint
lots of people expressed over time, but you definitely have to work
WITH the tool not AGAINST it. And some natural or naive approaches are
just plain fighting the tool. And if you fight WO, you have to be
either incredibly good, or it will win. I'm not good enough to win
against WO, therefore I try to not fight it ... ;-)
cug
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden