Re: Validating unique objects in CoreData
Re: Validating unique objects in CoreData
- Subject: Re: Validating unique objects in CoreData
- From: Roland King <email@hidden>
- Date: Sun, 14 Feb 2010 12:31:10 +0800
ok I downloaded your project. I agree with Jerry there's a memory leak, actually worse than that, you aren't actually remembering the article to set its parent if you create it, so
[ DDArticle newArticleWithID: messageid context:ctx ];
should be
article = [ DDArticle newArticleWithID: messageid context:ctx ];
[ article release ];
I got the test to run in 30 seconds, which isn't too bad as just looping over the articles takes about 7 seconds itself. Here's your problem, you're never saving the work, so you are building up all the articles you're adding in memory. Yes the SQL store has an index on it and yes coredata is issuing the correct select command but .. there's nothing in the store. So as well as looking in the store, it also has to scan every one of the objects still waiting to be persisted. Clearly even though it uses an index on the SQL, it doesn't use the index hint to build an in-memory map for finding the in-memory objects which match a predicate. So yes your adds go slower and slower and slower as core data each time does one SQL lookup in an always empty database which finds 0 objects in 0.0005 of a second, then goes scanning an increasing set of pending objects one by one. Since you never match as your IDs are unique, it scans the whole set every time. If you log it you'll see it adding slower and slower each iteration.
So I tried adding in [ archive save ] to make it commit and was surprised to find nothing changed, until I realized that [ archive save ] saves the wrong context, in fact your example code never saves anything to the DB at all!
Adding this in inside your add loop
if( [ [ ctx updatedObjects ] count ] > 100 )
[ ctx save:nil ];
means the working set is never larger than 100, so that limits the amount of in-memory lookup, once the objects are cached in the DB, the SQL lookup piece is blisteringly quick, so your check for existing objects runs in nearly constant time. 100 is a parameter you can tweak, you could just save every single time but that probably has overhead, if you make it much larger than 100 you have the save overhead less often but you have to scan more in-memory objects, it's a compromise.
1000 checks and inserts a second seems .. about ok to me and if you make sure and save the context regularly, you should be able to keep that rate up even as the database size grows.
On 14-Feb-2010, at 5:51 AM, daniele malcom wrote:
> Hi Roland, in fact indices table exists (for DDArticle entity):
> Enter SQL statements terminated with a ";"
> sqlite> .tables
> ZDDARTICLE Z_METADATA Z_PRIMARYKEY
> sqlite> .indices ZDDARTICLE
> ZDDARTICLE_ZMESSAGEID_INDEX
> ZDDARTICLE_ZPARENT_INDEX
>
> With my macbook pro insertion of 30k articles took about 2/3 minutes.
> I've uploaded a test project:
> http://dl.dropbox.com/u/103260/CoreDataTreeTest.zip
> I really don't know why it should take this long time but using
> Instruments the big part is obviously fetch for searching id and
> parent.
>
> On Sat, Feb 13, 2010 at 2:53 PM, Roland King <email@hidden> wrote:
>>
>> .. oh and one other thing, there's a core data instruments tool in XCode, well there is for OSX, not for iPhoneOS which I develop for which may be why I never saw it before. You could try that.
>>
>> On 13-Feb-2010, at 9:36 PM, Roland King wrote:
>>
>>> ok, I don't see anything wrong with the predicate code, but I'm no core data expert.
>>>
>>> I'll make one totally challengable statement. Assuming that core data uses sqllite in a rational way to store objects (eg not storing everything as blobs of opaque data) for instance one table per entity where each column of the table is an attribute and evaluating the predicate does what you would expect it to do, ie uses SQL to do as much of the heavy lifting on a fetch request as possible, that column is indexed in the table and sqllite is using the index; taking multi-minutes to find one row out of 20,000 just doesn't make any sense, it should take seconds at most.
>>>
>>> I believe core data does use table-per-entity. I think that partly because the documentation hints at it, partly because it makes sense and partly because I looked at the implementation of one data model that I have.
>>>
>>> I can't see the point of making indexes if the predicate code doesn't generate SQL which doesn't use them, but it's possible. It's possible that core data goes and loads all the entity rows and inspects their attributes by hand and filters them in code, but this is apple not microsoft.
>>>
>>> So that leaves column isn't indexed as the most likely. But you've checked the 'indexed' box. Here's another wild assed guess, does coredata only create a store when you have no current store? It certainly checks to see if the store is compatible with the model but as the indexed property is just a hint anyway, that store is compatible, just non-optimal .. it's possible if you created the store with the property defined as not-indexed and have just checked that box later, without regenerating the whole store, the index was never added. Did you do that, just check it later? Have you regenerated a complete new store since or are you using a store you've been populating for a while.
>>>
>>> Here's a particularly ugly idea, purists please stop reading now. We can look at the store and see if it has an index on that property ... first get up a terminal window and go to the path where your store is. I'm assuming you have sqlite3 installed like I do .. it came with the OS as far as I know.
>>>
>>> Your store should be called something.sqlite, let's say it's Foo. Type
>>>
>>> sqlite3 Foo.sqlite
>>>
>>> and that should open the store and give you a prompt. First you want to find the tables in the store, so type
>>>
>>> .tables
>>>
>>> as far as I can see they are called Z<YOUR ENTITY NAME>, so for you I'd expect to see one of the tables called ZMCARTICLE. If there is one, you can find out what indices are on it
>>>
>>> .indices ZMCARTICLE
>>>
>>> I believe again the indices are called Z<YOUR ENTITY NAME>_Z<YOUR ATTRIBUTE NAME>_INDEX, so you'd expect to find ZMCARTICLE_ZMESSAGEID_INDEX in that list. If you don't have it, the store wasn't created with that index. If none of those tables exist at all, my rudimentary reverse engineering of the whole coredata thing is flawed (or I'm using some entirely different version from you).
>>>
>>> If the tables and indices exist, including the one on ZMESSAGEID, I'm out of ideas unless someone knows of a way to put coredata into a form of debug mode and see the SQL generated to figure out if it's doing anything smart.
>>>
>>> If either none of the above works or it does work but you don't have the index, you have a couple of options. The right one is to delete your whole message store and run your app and make a brand new one to see if that then adds the indexed property with an index. Depending on how you've populated the store, that might be a real pain, perhaps you can force a migration or something. The other really stupid idea would be to just add the index and hope that doesn't break everything entirely which is entirely possible at which point you delete the store and start over. You would do that by running
>>>
>>> CREATE INDEX ZMCARTICLE_ZMESSAGEID_INDEX ON ZMCARTICLE (ZMESSAGEID);
>>>
>>> Here's another useful thing I just came across, I would certainly run this to see if the SQL being executed makes sense.
>>>
>>>
>>> With Mac OS X version 10.4.3 and later, you can use the user default com.apple.CoreData.SQLDebug to log to stderr the actual SQL sent to SQLite. (Note that user default names are case sensitive.) For example, you can pass the following as an argument to the application:
>>>
>>> -com.apple.CoreData.SQLDebug 1
>>> Higher levels of debug numbers produce more information, although using higher numbers is likely to be of diminishing utility.
>>>
>>>
>>>
>>> I'd love to hear about any other ways people have to debug coredata. I sort of trust apple has done a good job with it and for it to break down performance wise on looking for a row in 20,000 with a certain attribute doesn't make sense to me. If you really can't get it to work, I'd write a short project which inserts 20,000 simple objects into a store and another one which opened the store and goes looking for one by attribute in the way you have. If it takes multi-minutes, I'd sent it to apple as a bug.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden