Re: Expanding Import
Re: Expanding Import
- Subject: Re: Expanding Import
- From: Chuck Hill <email@hidden>
- Date: Thu, 9 Mar 2006 10:03:12 -0800
Hi Scott,
On Mar 8, 2006, at 5:13 PM, Scott Winn wrote:
Also, a little voice keeps whispering in my ear, "This is not the
problem." I keep looking at this and thinking that you just don't
have enough objects in these relationships to cause this problem
and that your processing should not be triggering off massive
fetches all the time. How many objects, on average, do you expect
to be in each of:
Company ->> Certificates
Shipper ->> Tickets
Location ->> Item
I ran the importer up to the file prior to the one it always chokes
on. This file has two Certificates in it and I am printing out
updatedObjects() to the Run Log before saveChanges() on each. The
first of the two is the largest and starts out printing to the log
quickly, but eventually slows down to a crawl. As soon as it hits
the next smaller Certificate the updatedObjects() printing appears
to go much faster. I'm not sure how much to read into that, but I
did some more detailed memory analysis (see below).
By that point I have . . .
4 Locations with 24,676 : 2,350 : 1,171 : 570 Items respectively.
31 Shippers with 548, 36, 41, 34, 11, 6, etc. Tickets each (548 is
by far and away the most)
1 Company with 81 Certificates
In the larger of the two updatedObjects() calls for that file I
have . . .
331 Certificate properties
582 Tickets properties
23,935 Item properties
None of that seems too out of line. Fetching that number of objects
is not going to slow the app to a crawl if done efficiently. Have
you tried turning on SQL logging to see if lots of SQL is getting
generated when it slows down? Depending on the cause, configuring
some batch faulting may give you a good performance boost. The
default is none (single row selects). You could try changing it from
0 to 10 on each of these to-many relationships and see what
difference that makes.
How much heap space is the application running in?
JVM_OPTIONS was set to -Xms128m and -Xmx512m. When I check total
memory it never seems to get much above 128MB even though there
should be room for it. I have tried -Xmx1024m and it doesn't seem
to make any difference to the total memory or the app's
performance. I also tried setting the minimum -Xms256m and didn't
notice any significant speed gains. I did see a difference in free
memory, obviously, but all the slow downs still occur in all the
same places.
Yeah, that should be plenty of heap space. I think we can discard
garbage collection as the culprit.
Could you just be running low on memory and going into repetitive
garbage collection cycles?
I'm garbage collecting explicitly after each file. Usually I'm in
the 70-80% range of free memory before and the 90% after.
When things get busy it looks like this. . .
Before gc()
free memory: 4444456
total memory: 133103616
(3 % free) (31 % free with 256MB)
After gc()
free memory: 127420928
total memory: 133103616
(95 % free)
. . . probably not too telling, since I have already disposed of my
workhorse EC by the time this gets called.
When I increased the minimum memory to 256MB the app still takes
several minutes to print out the largest of the updatedObjects()
calls. The only other thing I can think to do on the memory front
is get some output before and after the ec.saveChanges() rather
than at the end of the file. . . so I did. Everything looks pretty
normal. Free memory is about 70% before each ec.saveChanges and
80% after each ec.dispose(). The only odd thing is that every once
in a while the memory drops significantly after the dispose(). On
a few occasions it plummets from 80% before saving to 5% after the
dispose, but that doesn't seem to be bogging down the app. It
recovers and goes on its way.
By far the worst one is Item -> Location : Location ->> Item.
As I am reading through the file, I hit a locationCode that may
or may not have a matching DB entry. I need to look it up and
create a new one if it doesn't exist. Then I create the
Location/Item relationship to the Item I am currently reading in
Using addObjectToBothSidesOfRelationshipWithKey?
Yup. That's what all the newbie examples say to do.
And, usually, that is the best practice. But in the case of this
import you may be better off trading object graph consistency for
speed and using item().setLocation(aLocation) instead of
item.addObjectToBothSidesOfRelationshipWithKey(aLocation, "Location")
Looks like cutting out relationships is the next thing to try,
unless I ought to jack up the minimum memory to a few GBs just to
be sure.
It really does not look like that is the problem to me.
I might not have been too clear in my comments, but I do
understand that the Items in the list are properties not
individual objects to be updated. But that kind of takes me back
to my original question . . . Is attempting to store this object
(and a few others like it) with all of its to-many relationship
properties what is bogging down the app?
Maybe.
Is it an obvious, no brainer, yes, or should WO be able to handle
a few million properties like this? Seems a bit ridiculous to be
asking, but hey, what do I know?
A few million? That would probably cause some performance issues,
yes. Do you ever really need all few million of the properties?
Will you need to know all the items stored in a location? Or just
the (for exmaple) undelivered ones? Or the ones for a specific
Company?
No, I don't think I will ever want anyone pulling that amount of
data out, and I can't imagine why I would need to. There should
always be something else involved in a db lookup -- a company, date
range, some other identifying code, etc.
That is a good clue that you may not want to model that relationship
then. If you will always be fetching the items, there is no need to
carry around the burden of keeping an used relationship up to date.
It appears I need to streamline my relationships first, then when
I'm building the bulk of the app try to figure out what
relationships I can't live without. It was easy to create the
relationships in the first place; it should be just as easy to
put them back. Then I'll have to try and figure out how to pull
off what Anjo was talking about.
Maybe Anjo will post or point you to some code. Or I can try and
whip it out if I find time later.
Hopefully, not necessary, but I'd be grateful nonetheless, if only
for some better insight into how one gets tricky with the database
context.
Well, maybe a little warm up exercise this morning...
OK, for this I am assuming that import is all this program does and
that we don't have to worry about concurrent users in other editing
contexts etc.
EOEditingContext ec; // I am assuming this exists and is locked
// Set database context delegate to the object processing the import.
// Assumes that no other objects will be using this context
EODatabaseContext dbContext = databaseContextForModelNamed(ec,
"YourModelNameHere);
dbContext.lock();
try {
dbContext.setDelegate(this);
}
finally {
dbContext.unlock();
}
... process import here ...
// clear database context delegate
dbContext.lock();
try {
dbContext.setDelegate(null);
}
finally {
dbContext.unlock();
}
Then add this to this class:
protected static NSArray ignoredEntityNames = new NSArray(new Object
[]{"Item", etc.});
// Delegate method
public NSArray databaseContextShouldFetchObjects(
EODatabaseContext
dbCtxt,
EOFetchSpecification fetchSpec,
EOEditingContext ec) {
// You might need to so some more exact matching here...
if (ignoredEntityNames.containsObject(fetchSpec.entityName()) {
return NSArray.EmptyArray;
}
return null;
}
Chuck
--
Coming in 2006 - an introduction to web applications using WebObjects
and Xcode http://www.global-village.net/wointro
Practical WebObjects - for developers who want to increase their
overall knowledge of WebObjects or who are trying to solve specific
problems. http://www.global-village.net/products/practical_webobjects
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden