Re: Validating unique objects in CoreData
Re: Validating unique objects in CoreData
- Subject: Re: Validating unique objects in CoreData
- From: Roland King <email@hidden>
- Date: Sun, 14 Feb 2010 22:04:01 +0800
Please report back, I'd be very interested to hear what you find out. Even though I said 1000 objects a second seemed ok you know .. you're right, it's not. Have you tried a few things like not doing the check to see how fast you can just blast objects into the database without caring if they are unique to figure out how much the predicate is taking off you? That cocoawithlove blog post (which is pretty interesting) does say that the NSFetchRequest with a predicate on an indexed property was taking him 2.174 ms. That's for 1,000,000 objects but how linear are sqllite indices, I'm not sure, I suspect they give similar performance over a huge range of sizes of database (ie just because you have 30,000 objects doesn't mean the search will take 30,000 / 1,000,000 * 2.174 ms, they may be closer to constant time as the indices probably use btrees or similar). It seems you're getting much what I was, 30,000 inserts in 30 seconds, or 1000 a second or 1ms per insert (including the lookup). Matt Gallagher was getting over 2ms per predicate lookup on an indexed property, it may be the overhead of setting up the fetch request, having it figure out what SQL to call plus the actual SELECT (which I think is probably quite fast but if you turn that debugging variable on you'll see) is most of your 1ms per insert and that's just as fast as you can go, the predicate really takes a lot of time to set up and there seems no good way to reuse them.
The coredata instruments is supposed to have something on the total time spent in fetch requests .. that might help to prove that's where it lies.
On 14-Feb-2010, at 9:19 PM, daniele malcom wrote:
> Thanks Roland, saving each x records drops searching time. That's
> great. These are my benchmarks with 30k objects:
>
> CoreData without saving each X insertions: about 5-6 minutes
> CoreData with saving each 500 insertions: about 30 seconds
> CoreData with auxiliary indexes dictionary: about 2 seconds
>
> However that's seems to be strange.
> According to:
> http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html
>
> it should be faster (very very fast) than that and 30k objects are few
> objects for CoreData. I would to try to fill a bug at bugreporter and
> listen what Apple Engineers says.
> (a project that implement your idea is available here:
> http://dl.dropbox.com/u/103260/CoreDataTreeTest3.zip)
>
> On Sun, Feb 14, 2010 at 5:31 AM, Roland King <email@hidden> wrote:
>> ok I downloaded your project. I agree with Jerry there's a memory leak,
>> actually worse than that, you aren't actually remembering the article to set
>> its parent if you create it, so
>> [ DDArticle newArticleWithID: messageid context:ctx ];
>> should be
>> article = [ DDArticle newArticleWithID: messageid context:ctx ];
>> [ article release ];
>> I got the test to run in 30 seconds, which isn't too bad as just looping
>> over the articles takes about 7 seconds itself. Here's your problem, you're
>> never saving the work, so you are building up all the articles you're adding
>> in memory. Yes the SQL store has an index on it and yes coredata is issuing
>> the correct select command but .. there's nothing in the store. So as well
>> as looking in the store, it also has to scan every one of the objects still
>> waiting to be persisted. Clearly even though it uses an index on the SQL, it
>> doesn't use the index hint to build an in-memory map for finding the
>> in-memory objects which match a predicate. So yes your adds go slower and
>> slower and slower as core data each time does one SQL lookup in an always
>> empty database which finds 0 objects in 0.0005 of a second, then goes
>> scanning an increasing set of pending objects one by one. Since you never
>> match as your IDs are unique, it scans the whole set every time. If you log
>> it you'll see it adding slower and slower each iteration.
>> So I tried adding in [ archive save ] to make it commit and was surprised to
>> find nothing changed, until I realized that [ archive save ] saves the wrong
>> context, in fact your example code never saves anything to the DB at all!
>> Adding this in inside your add loop
>> if( [ [ ctx updatedObjects ] count ] > 100 )
>> [ ctx save:nil ];
>> means the working set is never larger than 100, so that limits the amount of
>> in-memory lookup, once the objects are cached in the DB, the SQL lookup
>> piece is blisteringly quick, so your check for existing objects runs in
>> nearly constant time. 100 is a parameter you can tweak, you could just save
>> every single time but that probably has overhead, if you make it much larger
>> than 100 you have the save overhead less often but you have to scan more
>> in-memory objects, it's a compromise.
>> 1000 checks and inserts a second seems .. about ok to me and if you make
>> sure and save the context regularly, you should be able to keep that rate up
>> even as the database size grows.
>> On 14-Feb-2010, at 5:51 AM, daniele malcom wrote:
>>
>> Hi Roland, in fact indices table exists (for DDArticle entity):
>> Enter SQL statements terminated with a ";"
>> sqlite> .tables
>> ZDDARTICLE Z_METADATA Z_PRIMARYKEY
>> sqlite> .indices ZDDARTICLE
>> ZDDARTICLE_ZMESSAGEID_INDEX
>> ZDDARTICLE_ZPARENT_INDEX
>>
>> With my macbook pro insertion of 30k articles took about 2/3 minutes.
>> I've uploaded a test project:
>> http://dl.dropbox.com/u/103260/CoreDataTreeTest.zip
>> I really don't know why it should take this long time but using
>> Instruments the big part is obviously fetch for searching id and
>> parent.
>>
>> On Sat, Feb 13, 2010 at 2:53 PM, Roland King <email@hidden> wrote:
>>
>> .. oh and one other thing, there's a core data instruments tool in XCode,
>> well there is for OSX, not for iPhoneOS which I develop for which may be why
>> I never saw it before. You could try that.
>>
>> On 13-Feb-2010, at 9:36 PM, Roland King wrote:
>>
>> ok, I don't see anything wrong with the predicate code, but I'm no core data
>> expert.
>>
>> I'll make one totally challengable statement. Assuming that core data uses
>> sqllite in a rational way to store objects (eg not storing everything as
>> blobs of opaque data) for instance one table per entity where each column of
>> the table is an attribute and evaluating the predicate does what you would
>> expect it to do, ie uses SQL to do as much of the heavy lifting on a fetch
>> request as possible, that column is indexed in the table and sqllite is
>> using the index; taking multi-minutes to find one row out of 20,000 just
>> doesn't make any sense, it should take seconds at most.
>>
>> I believe core data does use table-per-entity. I think that partly because
>> the documentation hints at it, partly because it makes sense and partly
>> because I looked at the implementation of one data model that I have.
>>
>> I can't see the point of making indexes if the predicate code doesn't
>> generate SQL which doesn't use them, but it's possible. It's possible that
>> core data goes and loads all the entity rows and inspects their attributes
>> by hand and filters them in code, but this is apple not microsoft.
>>
>> So that leaves column isn't indexed as the most likely. But you've checked
>> the 'indexed' box. Here's another wild assed guess, does coredata only
>> create a store when you have no current store? It certainly checks to see if
>> the store is compatible with the model but as the indexed property is just a
>> hint anyway, that store is compatible, just non-optimal .. it's possible if
>> you created the store with the property defined as not-indexed and have just
>> checked that box later, without regenerating the whole store, the index was
>> never added. Did you do that, just check it later? Have you regenerated a
>> complete new store since or are you using a store you've been populating for
>> a while.
>>
>> Here's a particularly ugly idea, purists please stop reading now. We can
>> look at the store and see if it has an index on that property ... first get
>> up a terminal window and go to the path where your store is. I'm assuming
>> you have sqlite3 installed like I do .. it came with the OS as far as I
>> know.
>>
>> Your store should be called something.sqlite, let's say it's Foo. Type
>>
>> sqlite3 Foo.sqlite
>>
>> and that should open the store and give you a prompt. First you want to find
>> the tables in the store, so type
>>
>> .tables
>>
>> as far as I can see they are called Z<YOUR ENTITY NAME>, so for you I'd
>> expect to see one of the tables called ZMCARTICLE. If there is one, you can
>> find out what indices are on it
>>
>> .indices ZMCARTICLE
>>
>> I believe again the indices are called Z<YOUR ENTITY NAME>_Z<YOUR ATTRIBUTE
>> NAME>_INDEX, so you'd expect to find ZMCARTICLE_ZMESSAGEID_INDEX in that
>> list. If you don't have it, the store wasn't created with that index. If
>> none of those tables exist at all, my rudimentary reverse engineering of the
>> whole coredata thing is flawed (or I'm using some entirely different version
>> from you).
>>
>> If the tables and indices exist, including the one on ZMESSAGEID, I'm out of
>> ideas unless someone knows of a way to put coredata into a form of debug
>> mode and see the SQL generated to figure out if it's doing anything smart.
>>
>> If either none of the above works or it does work but you don't have the
>> index, you have a couple of options. The right one is to delete your whole
>> message store and run your app and make a brand new one to see if that then
>> adds the indexed property with an index. Depending on how you've populated
>> the store, that might be a real pain, perhaps you can force a migration or
>> something. The other really stupid idea would be to just add the index and
>> hope that doesn't break everything entirely which is entirely possible at
>> which point you delete the store and start over. You would do that by
>> running
>>
>> CREATE INDEX ZMCARTICLE_ZMESSAGEID_INDEX ON ZMCARTICLE (ZMESSAGEID);
>>
>> Here's another useful thing I just came across, I would certainly run this
>> to see if the SQL being executed makes sense.
>>
>>
>> With Mac OS X version 10.4.3 and later, you can use the user default
>> com.apple.CoreData.SQLDebug to log to stderr the actual SQL sent to SQLite.
>> (Note that user default names are case sensitive.) For example, you can pass
>> the following as an argument to the application:
>>
>> -com.apple.CoreData.SQLDebug 1
>>
>> Higher levels of debug numbers produce more information, although using
>> higher numbers is likely to be of diminishing utility.
>>
>>
>>
>> I'd love to hear about any other ways people have to debug coredata. I sort
>> of trust apple has done a good job with it and for it to break down
>> performance wise on looking for a row in 20,000 with a certain attribute
>> doesn't make sense to me. If you really can't get it to work, I'd write a
>> short project which inserts 20,000 simple objects into a store and another
>> one which opened the store and goes looking for one by attribute in the way
>> you have. If it takes multi-minutes, I'd sent it to apple as a bug.
>>
>>
> _______________________________________________
>
> Cocoa-dev mailing list (email@hidden)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden