Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: CoreData & importing a large amount of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CoreData & importing a large amount of data

Subject: Re: CoreData & importing a large amount of data
From: Dominik Paulmichl <email@hidden>
Date: Fri, 21 Oct 2005 21:19:33 +0200

Thanks a lot for your suggestions!!

I'll use matthews solution.

Regards

Dominik


Am 20.10.2005 um 21:27 schrieb Matthew Firlik:

On Oct 19, 2005, at 1:31 PM, Chris Hanson wrote:
On Oct 19, 2005, at 11:21 AM, Dominik Paulmichl wrote:
For testing and development purposes I use an XML data store. So I know that Core Data makes in memory searches. Even when I save each new entry the Mac ran very fast out of memory. :-( How can I avoid this??
Finally, probably the most significant thing you're doing is following a "find-or-create" pattern, where you set up some data to create, check to see if it's already been created, and then create it if it hasn't been created already. This is generally *not* a pattern you want to follow when importing data, because it turns an O(n) problem into an O(n^2) problem.

It's much better -- when possible -- to just create everything "flat" in one pass, and then fix up the relationships in a second pass. For example, if you're importing data and you know you won't have any duplicates (say because your initial data set is empty) you can just create a bunch of managed objects to represent your data and not do any searches at all. Or if you're importing "flat" data with no relationships, you can just create managed objects for the entire set you're importing then and weed out (delete) any duplicates before save using a single large IN predicate.

If you do need to follow a find-or-create pattern -- say because you're importing heterogeneous data where relationship information is mixed in with attribute information -- you'll be much better off if you introduce a cache. You can just use an NSMutableDictionary or CFMutableDictionaryRef for this purpose, using the criteria you're finding on as the key. Check to see if the object you're looking for is in the dictionary; if it isn't, then do a fetch. If something is either found or if you create it then save it in the cache for the next time it's looked up. And of course you can get rid of your cache when you're done with the import.
Chris' observation is spot on. There are many situations where developers may need to find existing (persisted) objects for a set of discrete input values. The natural tendency would be to create a loop, grab each value, fetch to see if there is a matching persisted object, etc. Plainly, this pattern does not scale. If you used Shark to profile your application with that pattern, you'd find the fetch to be one of the more expensive operations in the loop (as compared to just iterating a collection of items.)

This can be optimized by reducing your fetches to the minimum you need. How to accomplish this depends on the amount of reference data you have to work with. If you are importing 100 potential new things, and only have 2000 in your database, fetching all of the existing and caching them may not be a significant penalty (especially if you have to perform the operation more than once.) However, if you have 100,000 items in your database, the memory pressure of keeping those cached may be prohibitive.

One trick is to use a combination of an "IN" predicate and sorting to reduce your Core Data usage to a single fetch request. Say you want to take a list of names (as strings) and create Person records for all those not already in the database. Consider this code, where Person is an entity with a name attribute, and listOfNamesAsString is the list of names you want to find or add objects for:
= = = = =
// get the names to parse in sorted order NSArray *names = [[listOfNamesAsString componentsSeparatedByString:@"\n"] sortedArrayUsingSelector: @selector(compare:)];

// create the fetch request to get all Persons matching the names NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init] autorelease]; [fetchRequest setEntity:[NSEntityDescription entityForName:@"Person" inManagedObjectContext:yourMOC]]; [fetchRequest setPredicate: [NSPredicate predicateWithFormat: @"(name IN %@)", names]];

// make sure the results are sorted as well [fetchRequest setSortDescriptors: [NSArray arrayWithObject: [[[NSSortDescriptor alloc] initWithKey: @"name" ascending:YES] autorelease]]];

// get all of the matches NSError *error; NSArray *personsMatchingNames = [yourMOC executeFetchRequest:fetchRequest error:&error];
= = = = =
First, we separate and sort the name (strings) we are interested in. Next, we create a predicate using "IN" with the array of name strings, and a sort descriptor which ensures the results are returned with the same sorting as the array of name strings. (The "IN" is equivalent to an SQL IN operation, where the left-hand side must appear in the collection specified by the right-hand side.)

As a result, we end up with two sorted arrays -- one with the name strings passed in, and one with the managed objects that matched them. Processing them simply requires you to walk the sorted lists, compare, and "do the right thing" at each index. (Get the next name string and person: if the name doesn't match, create a new Person for that name string. Get the next Person: if the names match, move to the next name string and person. etc etc) Regardless of how many names you pass in, you'll only perform a single fetch, and the rest is just walking the result set.
- matthew
_______________________________________________ Do not post admin requests to the list. They will be ignored. Cocoa-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: email@hidden
This email sent to email@hidden


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >CoreData & importing a large amount of data (From: Dominik Paulmichl <email@hidden>)
  >Re: CoreData & importing a large amount of data (From: Chris Hanson <email@hidden>)
  >Re: CoreData & importing a large amount of data (From: Matthew Firlik <email@hidden>)




Prev by Date:
Re: Core Data: Dual-Inverse Relation Possible in Modeler?

Next by Date:
Subclassing ScreenSaverView

Previous by thread:
Re: CoreData & importing a large amount of data

Next by thread:
Override Terminate

Index(es):

Date
Thread