Re: CoreData & importing a large amount of data
Re: CoreData & importing a large amount of data
- Subject: Re: CoreData & importing a large amount of data
- From: Matthew Firlik <email@hidden>
- Date: Thu, 20 Oct 2005 12:27:35 -0700
On Oct 19, 2005, at 1:31 PM, Chris Hanson wrote:
On Oct 19, 2005, at 11:21 AM, Dominik Paulmichl wrote:
For testing and development purposes I use an XML data store. So I
know that Core Data makes in memory searches.
Even when I save each new entry the Mac ran very fast out of
memory. :-(
How can I avoid this??
Finally, probably the most significant thing you're doing is
following a "find-or-create" pattern, where you set up some data to
create, check to see if it's already been created, and then create
it if it hasn't been created already. This is generally *not* a
pattern you want to follow when importing data, because it turns an
O(n) problem into an O(n^2) problem.
It's much better -- when possible -- to just create everything
"flat" in one pass, and then fix up the relationships in a second
pass. For example, if you're importing data and you know you won't
have any duplicates (say because your initial data set is empty)
you can just create a bunch of managed objects to represent your
data and not do any searches at all. Or if you're importing "flat"
data with no relationships, you can just create managed objects for
the entire set you're importing then and weed out (delete) any
duplicates before save using a single large IN predicate.
If you do need to follow a find-or-create pattern -- say because
you're importing heterogeneous data where relationship information
is mixed in with attribute information -- you'll be much better off
if you introduce a cache. You can just use an NSMutableDictionary
or CFMutableDictionaryRef for this purpose, using the criteria
you're finding on as the key. Check to see if the object you're
looking for is in the dictionary; if it isn't, then do a fetch. If
something is either found or if you create it then save it in the
cache for the next time it's looked up. And of course you can get
rid of your cache when you're done with the import.
Chris' observation is spot on. There are many situations where
developers may need to find existing (persisted) objects for a set of
discrete input values. The natural tendency would be to create a
loop, grab each value, fetch to see if there is a matching persisted
object, etc. Plainly, this pattern does not scale. If you used
Shark to profile your application with that pattern, you'd find the
fetch to be one of the more expensive operations in the loop (as
compared to just iterating a collection of items.)
This can be optimized by reducing your fetches to the minimum you
need. How to accomplish this depends on the amount of reference data
you have to work with. If you are importing 100 potential new
things, and only have 2000 in your database, fetching all of the
existing and caching them may not be a significant penalty
(especially if you have to perform the operation more than once.)
However, if you have 100,000 items in your database, the memory
pressure of keeping those cached may be prohibitive.
One trick is to use a combination of an "IN" predicate and sorting to
reduce your Core Data usage to a single fetch request. Say you want
to take a list of names (as strings) and create Person records for
all those not already in the database. Consider this code, where
Person is an entity with a name attribute, and listOfNamesAsString is
the list of names you want to find or add objects for:
= = = = =
// get the names to parse in sorted order
NSArray *names = [[listOfNamesAsString
componentsSeparatedByString:@"\n"]
sortedArrayUsingSelector: @selector(compare:)];
// create the fetch request to get all Persons matching the names
NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
autorelease];
[fetchRequest setEntity:[NSEntityDescription
entityForName:@"Person" inManagedObjectContext:yourMOC]];
[fetchRequest setPredicate: [NSPredicate predicateWithFormat:
@"(name IN %@)", names]];
// make sure the results are sorted as well
[fetchRequest setSortDescriptors: [NSArray arrayWithObject:
[[[NSSortDescriptor alloc] initWithKey: @"name"
ascending:YES] autorelease]]];
// get all of the matches
NSError *error;
NSArray *personsMatchingNames = [yourMOC
executeFetchRequest:fetchRequest error:&error];
= = = = =
First, we separate and sort the name (strings) we are interested in.
Next, we create a predicate using "IN" with the array of name
strings, and a sort descriptor which ensures the results are returned
with the same sorting as the array of name strings. (The "IN" is
equivalent to an SQL IN operation, where the left-hand side must
appear in the collection specified by the right-hand side.)
As a result, we end up with two sorted arrays -- one with the name
strings passed in, and one with the managed objects that matched
them. Processing them simply requires you to walk the sorted lists,
compare, and "do the right thing" at each index. (Get the next name
string and person: if the name doesn't match, create a new Person
for that name string. Get the next Person: if the names match, move
to the next name string and person. etc etc) Regardless of how many
names you pass in, you'll only perform a single fetch, and the rest
is just walking the result set.
- matthew
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden