re: Is Core Data appropriate to my task?
re: Is Core Data appropriate to my task?
- Subject: re: Is Core Data appropriate to my task?
- From: Ben Trumbull <email@hidden>
- Date: Thu, 10 Sep 2009 14:04:00 -0700
Gwynne,
I have an application that manages two kinds of data: A singular file
that contains a large amount of rarely changed (but not invariant)
data, and documents that contain one root object's worth of
information that connects to the singular data set in a very large
number of places; the documents are in fact little more than a chosen
list of references into the singular data set, plus some small bits of
independant data. The documents are modified quite often.
Originally I thought Core Data must surely be ideal for this sort of
application; the object graph management alone should have been very
helpful, to say nothing of the management of stores and the abilty to
integrate with bindings and NSDocument. I got as far as reading all
the basic (and some of the not so basic) Core Data documentation and
began wondering whether or not my data would fit into its model after
all. For example, in the large singular data set, there are a large
number of what SQL would call "lookup tables", data that's used only
to avoid duplicating a set of constant values that are used elsewhere.
To use the Employee/Department example from the Core Data docs,
sometime in the future an Employee might have a "planetOfOrigin"
attribute. Assuming one limits one's self to the restriction of the
speed of light (so, not TOO far in the future), the resulting Planet
entity would only ever have a small number of possible values. Such an
attribute might be better modeled in the Employee entity by something
like SQL's ENUM or SET types. If the set of possible values is "Earth"
and "Not Earth", a Boolean might make more sense. If the set of
possible values is "Earth", "Mars", "Venus", etc., an ENUM would be a
reasonably obvious choice; after all, how often does the solar system
gain or lose a planet (cough Pluto cough)? With such a small data set,
a lookup table would only be the obvious choice if the set of possible
values was expected to change with any frequency. But Core Data has no
support for such a thing; I would either have to write a custom
property type or model it by creating the Planet entity and giving it
a relationship from Employee.
Correct. You can write a custom NSValueTransformer with the
Transformable property type to implement an ENUM, or normalize the
data into a separate table as a formally modeled entity. Which is
better depends on how big the data values are, how many of them there
are, and how frequently they change.
Is that really so bad ? The alternative is to do ALL the work yourself.
Let's pretend the lookup table *was* the obvious choice for some
reason; the speed of light barrier has been broken and now there's a
whole mess of planets. So in Core Data parlance, the Employee entity
has a one-to-one relationship to the Planet entity.
A lonely planet. That's either going to be one-to-many or a no
inverse to-one.
The inverse
relationship from Planet to Employee, "all employees from this planet"
is technically feasible, even easy, to model, but it's almost
certainly a waste of time and effort. But the Core Data documentation
offers a long list of very dire warnings about one-way relationships
between entities.
Yes, and for most situations those warnings are there for very good
reasons. But if there were no reasons for such relationships, then it
wouldn't be a warning, it simply wouldn't exist.
Worse, the list of possible Planets certainly doesn't belong in the
same document file that holds a single Employee's data; you'd end up
duplicating that data across every single Employee. So the list of
Planets would instead be in a global store.
There are lots of ways to model that, but, yes, this would be the most
natural.
But oops, Core Data can't
model cross-store relationships, so you use a fetched property, which
is one-way.
You could use a fetched property, or handle this in code by storing a
URI for the destination object in a different store, and fetching the
matching objectID either lazily in in -awakeFromFetch. We've
generally recommended using a custom accessor method for this instead
of fetched properties.
Inverse relationship problem solved, unless you actually
had a use for that relationship. But fetched properties need a fetch
request, and what do you put in the predicate? Now you need some kind
of identifier in the Employee for the fetch to operate on,
Yes, but this isn't any different than the problem would be without
Core Data for managing values in two different databases.
and now you have two fields (the "planetOfOriginName" string for
the predicate and
"planetOfOrigin" as the fetched property) to model a single
relationship. How to maintain referential integrity?
Again, no different than the problem would be without Core Data. This
is why the modeling tool recommends using inverse relationships.
Maintaining the integrity by oneself is tedious and error prone.
And what if you DID want the inverse relationship - do you model
another fetched
property in the other direction? What's the predicate there,
"planetOfOriginName LIKE [c] $FETCH_SOURCE.name"? Now your Planet
entity has intimate knowledge of the structure of your Employee
entity; that can't be good.
If this were a join you coded yourself in SQL, the Planet table would
effectively know which column matched in the Employee table.
Also, if you do in code or fetched properties, this hand made cross
store relationship, you should prefer numeric keys to text strings for
your joins. Creating a de facto join through a LIKE query is pretty
crazy. That's a case insensitive, local aware, Unicode regex there.
String operations are much more expensive than integer comparisons.
At the very least, use == for your string compares.
Of that's true for any database.
It seems to me that Core Data really is intended to deal with lists of
root objects, i.e. the entire list of Employees in one store, rather
than one Employee per store.
One document per Employee is a bit unusual. But it's feasible if
that's your requirement.
The Core Data documentation mentions
attaching multiple stores to a persistent store coordinator, but I
can't make any sense of how interrelationships between the stores are
handled.
The old fashioned way.
Is Core Data really more appropriate to my dataset than an SQLite
database and a simple Employee object that fetches from that database?
If so, I'd appreciate some help in understanding how.
Are you comparing apples to apples here ? Using multiple SQLite
database files directly will open you up to nearly all the same
problems you're describing for Core Data. If you're not going to use
multiple SQLite database files without Core Data, why would you with
Core Data ?
(Let me take this opportunity to say that for all the warnings that
Core Data is not and never has been a database, almost every concept I
see in it makes me think "O/R mapper for SQLite".)
Core Data is an O/R mapping framework, among other things. But O/R
frameworks are not SQL databases. Modeling your data in any O/R
framework as if you were writing SQL directly is inefficient and
mistaken.
Saying that Core Data is a database is like saying your compiler is an
assembler. Well, the compiler suite uses an assembler, sure, and they
both output object code in the end, but that does not mean the best
way to use your compiler is to write in assembly.
- Ben
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden