Also, a little voice keeps whispering in my ear, "This is not the problem." I keep looking at this and thinking that you just don't have enough objects in these relationships to cause this problem and that your processing should not be triggering off massive fetches all the time. How many objects, on average, do you expect to be in each of: Company ->> Certificates Shipper ->> Tickets Location ->> Item
I ran the importer up to the file prior to the one it always chokes on. This file has two Certificates in it and I am printing out updatedObjects() to the Run Log before saveChanges() on each. The first of the two is the largest and starts out printing to the log quickly, but eventually slows down to a crawl. As soon as it hits the next smaller Certificate the updatedObjects() printing appears to go much faster. I'm not sure how much to read into that, but I did some more detailed memory analysis (see below).
By that point I have . . .
4 Locations with 24,676 : 2,350 : 1,171 : 570 Items respectively. 31 Shippers with 548, 36, 41, 34, 11, 6, etc. Tickets each (548 is by far and away the most) 1 Company with 81 Certificates
In the larger of the two updatedObjects() calls for that file I have . . .
331 Certificate properties 582 Tickets properties 23,935 Item properties
How much heap space is the application running in?
JVM_OPTIONS was set to -Xms128m and -Xmx512m. When I check total memory it never seems to get much above 128MB even though there should be room for it. I have tried -Xmx1024m and it doesn't seem to make any difference to the total memory or the app's performance. I also tried setting the minimum -Xms256m and didn't notice any significant speed gains. I did see a difference in free memory, obviously, but all the slow downs still occur in all the same places.
Could you just be running low on memory and going into repetitive garbage collection cycles?
I'm garbage collecting explicitly after each file. Usually I'm in the 70-80% range of free memory before and the 90% after. When things get busy it looks like this. . .
Before gc() free memory: 4444456 total memory: 133103616 (3 % free) (31 % free with 256MB)
After gc() free memory: 127420928 total memory: 133103616 (95 % free)
. . . probably not too telling, since I have already disposed of my workhorse EC by the time this gets called.
When I increased the minimum memory to 256MB the app still takes several minutes to print out the largest of the updatedObjects() calls. The only other thing I can think to do on the memory front is get some output before and after the ec.saveChanges() rather than at the end of the file. . . so I did. Everything looks pretty normal. Free memory is about 70% before each ec.saveChanges and 80% after each ec.dispose(). The only odd thing is that every once in a while the memory drops significantly after the dispose(). On a few occasions it plummets from 80% before saving to 5% after the dispose, but that doesn't seem to be bogging down the app. It recovers and goes on its way.
By far the worst one is Item -> Location : Location ->> Item. As I am reading through the file, I hit a locationCode that may or may not have a matching DB entry. I need to look it up and create a new one if it doesn't exist. Then I create the Location/Item relationship to the Item I am currently reading in
Using addObjectToBothSidesOfRelationshipWithKey?
Yup. That's what all the newbie examples say to do.
And, usually, that is the best practice. But in the case of this import you may be better off trading object graph consistency for speed and using item().setLocation(aLocation) instead of item.addObjectToBothSidesOfRelationshipWithKey(aLocation, "Location")
Looks like cutting out relationships is the next thing to try, unless I ought to jack up the minimum memory to a few GBs just to be sure.
I might not have been too clear in my comments, but I do understand that the Items in the list are properties not individual objects to be updated. But that kind of takes me back to my original question . . . Is attempting to store this object (and a few others like it) with all of its to-many relationship properties what is bogging down the app?
Maybe.
Is it an obvious, no brainer, yes, or should WO be able to handle a few million properties like this? Seems a bit ridiculous to be asking, but hey, what do I know?
A few million? That would probably cause some performance issues, yes. Do you ever really need all few million of the properties? Will you need to know all the items stored in a location? Or just the (for exmaple) undelivered ones? Or the ones for a specific Company?
No, I don't think I will ever want anyone pulling that amount of data out, and I can't imagine why I would need to. There should always be something else involved in a db lookup -- a company, date range, some other identifying code, etc.
It appears I need to streamline my relationships first, then when I'm building the bulk of the app try to figure out what relationships I can't live without. It was easy to create the relationships in the first place; it should be just as easy to put them back. Then I'll have to try and figure out how to pull off what Anjo was talking about.
Maybe Anjo will post or point you to some code. Or I can try and whip it out if I find time later.
Hopefully, not necessary, but I'd be grateful nonetheless, if only for some better insight into how one gets tricky with the database context.
|