Re: CoreData & importing a large amount of data
Re: CoreData & importing a large amount of data
- Subject: Re: CoreData & importing a large amount of data
- From: Dominik Paulmichl <email@hidden>
- Date: Fri, 21 Oct 2005 21:19:33 +0200
Thanks a lot for your suggestions!!
I'll use matthews solution.
Regards
Dominik
Am 20.10.2005 um 21:27 schrieb Matthew Firlik:
On Oct 19, 2005, at 1:31 PM, Chris Hanson wrote:
On Oct 19, 2005, at 11:21 AM, Dominik Paulmichl wrote:
For testing and development purposes I use an XML data store. So I
know that Core Data makes in memory searches.
Even when I save each new entry the Mac ran very fast out of memory.
:-(
How can I avoid this??
Finally, probably the most significant thing you're doing is
following a "find-or-create" pattern, where you set up some data to
create, check to see if it's already been created, and then create it
if it hasn't been created already. This is generally *not* a pattern
you want to follow when importing data, because it turns an O(n)
problem into an O(n^2) problem.
It's much better -- when possible -- to just create everything "flat"
in one pass, and then fix up the relationships in a second pass. For
example, if you're importing data and you know you won't have any
duplicates (say because your initial data set is empty) you can just
create a bunch of managed objects to represent your data and not do
any searches at all. Or if you're importing "flat" data with no
relationships, you can just create managed objects for the entire set
you're importing then and weed out (delete) any duplicates before
save using a single large IN predicate.
If you do need to follow a find-or-create pattern -- say because
you're importing heterogeneous data where relationship information is
mixed in with attribute information -- you'll be much better off if
you introduce a cache. You can just use an NSMutableDictionary or
CFMutableDictionaryRef for this purpose, using the criteria you're
finding on as the key. Check to see if the object you're looking for
is in the dictionary; if it isn't, then do a fetch. If something is
either found or if you create it then save it in the cache for the
next time it's looked up. And of course you can get rid of your
cache when you're done with the import.
Chris' observation is spot on. There are many situations where
developers may need to find existing (persisted) objects for a set of
discrete input values. The natural tendency would be to create a
loop, grab each value, fetch to see if there is a matching persisted
object, etc. Plainly, this pattern does not scale. If you used Shark
to profile your application with that pattern, you'd find the fetch to
be one of the more expensive operations in the loop (as compared to
just iterating a collection of items.)
This can be optimized by reducing your fetches to the minimum you
need. How to accomplish this depends on the amount of reference data
you have to work with. If you are importing 100 potential new things,
and only have 2000 in your database, fetching all of the existing and
caching them may not be a significant penalty (especially if you have
to perform the operation more than once.) However, if you have
100,000 items in your database, the memory pressure of keeping those
cached may be prohibitive.
One trick is to use a combination of an "IN" predicate and sorting to
reduce your Core Data usage to a single fetch request. Say you want
to take a list of names (as strings) and create Person records for all
those not already in the database. Consider this code, where Person
is an entity with a name attribute, and listOfNamesAsString is the
list of names you want to find or add objects for:
= = = = =
// get the names to parse in sorted order
NSArray *names = [[listOfNamesAsString
componentsSeparatedByString:@"\n"]
sortedArrayUsingSelector: @selector(compare:)];
// create the fetch request to get all Persons matching the names
NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]
autorelease];
[fetchRequest setEntity:[NSEntityDescription
entityForName:@"Person" inManagedObjectContext:yourMOC]];
[fetchRequest setPredicate: [NSPredicate predicateWithFormat:
@"(name IN %@)", names]];
// make sure the results are sorted as well
[fetchRequest setSortDescriptors: [NSArray arrayWithObject:
[[[NSSortDescriptor alloc] initWithKey: @"name" ascending:YES]
autorelease]]];
// get all of the matches
NSError *error;
NSArray *personsMatchingNames = [yourMOC
executeFetchRequest:fetchRequest error:&error];
= = = = =
First, we separate and sort the name (strings) we are interested in.
Next, we create a predicate using "IN" with the array of name strings,
and a sort descriptor which ensures the results are returned with the
same sorting as the array of name strings. (The "IN" is equivalent to
an SQL IN operation, where the left-hand side must appear in the
collection specified by the right-hand side.)
As a result, we end up with two sorted arrays -- one with the name
strings passed in, and one with the managed objects that matched them.
Processing them simply requires you to walk the sorted lists,
compare, and "do the right thing" at each index. (Get the next name
string and person: if the name doesn't match, create a new Person for
that name string. Get the next Person: if the names match, move to
the next name string and person. etc etc) Regardless of how many
names you pass in, you'll only perform a single fetch, and the rest is
just walking the result set.
- matthew
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
email@hidden
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden