RE: Cleaning "garbage" in Core Data
RE: Cleaning "garbage" in Core Data
- Subject: RE: Cleaning "garbage" in Core Data
- From: Ben Trumbull <email@hidden>
- Date: Mon, 17 Aug 2009 15:00:01 -0700
Squ,
You're basically saying you don't want to model your data formally,
but you do want to write your own relationship maintenance and delete
propagation system. That is a lot of work. At this point, your
problem doesn't have anything to do with Core Data. You have
NSDictionaries with informal relationships to other NSDictionaries
that you want to clean up.
In the general case, a scanning process across large amounts of
persistent data is going to be very expensive. It's hard to do this
well. You really ought to have performance data in hand to verify
that this plan is faster than purging more aggressively during saves.
1.
When the app is idle, keep selecting random employees to clean up
key-value pairs within them. Stop this process as soon as the app is
not idle any more. Over time, this will tend to keep the app
"clean", and the user won't notice anything. How could I do this?
How can I figure out whether an app is idle? Will Apple's "Treading
Programming Guide" help me with it?
This won't work since you may randomly select "clean" employees.
Indeed, the further you get in the collection process, the odds of
selecting "dirty" employees decreases thereby making this very
inefficient.
2.
Judging from ONLY (yes, ONLY; do NOT break the NDA) the information
already available on apple.com's GCD intro page and the nice PDF
intro brochure they made for everyone, would I be able to create
threads for this task, create MOCs for each of them, do the cleaning
on those threads, and hand those threads to GCD? It sounds fancy and
cool, but is this even a realistic solution? The problem I can think
of is that when actually saving the "main MOC", the main MOC will
still contain the garbage and wouldn't be able to figure out whether
to persist the garbage or not.
This is orthogonal to actually solving your problem. You'll need a
reasonable design for the collection process before deciding whether
or not to do it synchronously on the main thread, in a background
thread, or even in a background process. Given the amorphous problem
description, and the early stages of resolution, I would counsel
against "throwing threads" at it.
Finally, note that I cannot use willTurnIntoFault, and clean up the
garbage there, because it is very inefficient in my case to do the
clean up for the managed objects one by one. And again, the user
would have to wait while saving, which is a bad solution. I really
have to do them in batches, in such a way that the user will not
notice it.
-willTurnIntoFault is for *memory* management of the instantiated
object's life cycle. It is NOT an appropriate place to access the
database.
-willSave, however, is a good place for this kind of work.
I am not *explicitly* marking anything as garbage. Whenever a user
decides to remove a ValidKeys managed object, corresponding key-
value pairs in all the userInfo dictionaries are *conceptually*
marked as garbage. This is because I do not want to show the beach
ball to the user while things are getting cleaned up. I want to
postpone the cleaning up until later when it will not bother the user.
You're basically saying you don't want to model your data formally,
but you do want to write your own relationship maintenance and delete
propagation system. That is a lot of work. At this point, your
problem doesn't have anything to do with Core Data. You have
NSDictionaries with informal relationships to other NSDictionaries
that you want to clean up.
You'd be better off modeling the data formally, and making it Core
Data's problem. Alternatively, you could sever the NSDictionaries
from your NSManagedObjects and focus on a data structure other than an
NSDictionary to hold these keys. The NSManagedObjects can be linked
to another data structure via a transient attribute or transient
relationship with a persistent attribute that holds a UUID or token to
look up the peer custom data structure.
Caleb's suggestions are all good ones:
Squ, if this is all correct, some things to consider:
- Rather than removing invalid key/value pairs from the userInfo
dictionary, it may be faster to copy the valid keys to a new
dictionary and dump the old one.
- Have you profiled your code? There are a few things in your proposed
operation which sound like they might take up a lot of time, but it's
impossible to know for sure without profiling. Implement the simplest
solution you can think of and then do some measuring.
- Depending on the results of the profiling, you might want to rethink
the data structure you're using, or how you're using that structure.
For example, if you keep a dictionary for each of the attributes that
are currently represented by your userInfo keys, and use Employee name
or ID or whatever as the keys into each of those dictionaries, then
removing all the data for a garbage key is just a matter of deleting
the corresponding dictionary. (Of course, you still have to solve the
same problem you have now, but you only have to do it when an Employee
is removed from the system.) The point is: let your performance data
and operational requirements be your guide.
- Ben
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden