basic Core Data scaling question
basic Core Data scaling question
- Subject: basic Core Data scaling question
- From: Michael B Johnson <email@hidden>
- Date: Mon, 1 Sep 2008 18:52:17 -0700
quick question:
Let's say I have 100,000 ManagedObjects of type A. Each has a one-to-
one relationship to a ManagedObject of type B, which has a reciprocal
one-to-many relationship with all the As.
Assuming I have all 100,000 As around (I've just created them in the
ManagedStore) - what's the most efficient way to set up the
relationships between all those As and B?
'cause doing the things that seem obvious to me (either looping over
the As setting their B or setting B to point to the collection of As)
is taking way, way too freakin' long...
--------------------------
longer background:
So I've dabbled with Core Data the past few years, but it never really
mapped well on to the kinds of apps I've been writing. Recently,
though, I finally have an application that I started writing that I
think maps really well on to it, but I'm having some initial scaling
problems that I'm trying to understand.
Loosely, here's the scenario:
You have a Project.
Each Project has some set of Albums and Artists.
Each Artist makes some set of Images, many of which get collected in
one more more Albums.
Each Image can have some set of ImageVersions, but most only have 1 or
2.
For a given Project, you'll probably have 50 or so Artists, 100 or so
Albums, 80,000 or so Images, and a total of 150,000 or so ImageVersions.
Eventually, I expect to have dozens of Projects, maybe more, so that
the eventually database of ImageVersions would be in the low millions.
Other than the actual image data (and its corresponding proxy and
thumbnails), all the data you need to keep around is pretty simple,
and maps nicely on to CoreData (strings, dates, Integers of various
sizes, etc.)
But to start reasonable (but non-trivial), let's take a Project that
has 91K ImageVersions of 80K Images. There 162 Albums and 29 Artists.
I have a simple .csv file with all the info in it, and I iterate over
it to build up an array of dictionaries of all the info.
Then taking that array of dictionaries (building those from the 91K
line csv takes a few 10s of seconds), I then start iterating over
them, making the appropriate MangedObjects. I first pull out the
Project(s) from the file (there's only 1 in this example, but there
could be multiple), and then I make the Artist and Album objects. For
each of the Artist and Album objects I find the Project instance and
wire them up.
All of that runs at a reasonable speed.
The problem comes when I start adding the Images to the managed
store. I time how long it takes to add 100 at a time. The first 100
go in 0.022 seconds, but by the time I've inserted 4,200 of them, it's
taking 1 second/100, at 20,000 it's taking 6sec/100, and by the time
I'm up to 90,000, it's taking over 20sec/100. It literally takes
hours to chew through.
After much spelunking, I've found that it's when I set the project
relationship on the Image that is taking up all the time. If I bring
in all the Projects, Artists, Albums and Images without wiring up the
Images to the Project (although I do do it for the Albums), the whole
thing runs in about 30 seconds.
This must be a common idiom - what idiotic thing am I doing wrong?
Thanks for any help.
--> Michael B. Johnson, PhD
--> http://homepage.mac.com/drwave (personal)
--> http://xenia.media.mit.edu/~wave (alum)
--> MPG Lead
--> Pixar Animation Studios
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden