Optimizing Core Data for large time series
Optimizing Core Data for large time series
- Subject: Optimizing Core Data for large time series
- From: Peter Passaro <email@hidden>
- Date: Tue, 8 May 2007 12:37:04 +0100
Hi All,
I'm experimenting with Core Data for working with very large data
sets of neurological recordings and I'm looking for any suggestions
on what others think the best way to tackle the problems of database
performance/data storage and retrieval might be for this type of system.
The data model is very simple, it is: Recording entity <-->>
DataStream entities <-->> DataPoint entities. The recording can have
up to 100s of datastreams, and the datapoints have just two
parameters (voltage and timePoint) but can number into the billions
of points.
I need to do three types of operations on this data:
- import it by parsing the original raw data file and inserting it
into my data model
- display it (converting datapoints to 2D points in bezier paths)
- analyze it (running algorithms that operate on each data point
within a certain time range, or comparing streams against other streams)
In my initial attempt I just did the simplest thing and kept them all
in a single MOC and SQL store. This is ok for small recordings, but
quickly became unwieldy for more realistic data sets (store read/
writes become very slow). After looking at the Apple docs on CD
performance and reading some of the posts here, I started coding the
next version
These are the changes am I experimenting with today to try and speed
things up:
- Splitting each stream off to its own context and store
- Converting the DataPoints into BLOBs (and removing them from the
data model) and keeping them in binary files which are referenced by
a new entity, DataChunk, which has parameters: fileURL, timeBegin,
timeEnd, numPoints. This creates other issues because I might take a
performance hit accessing individual time points for processing,
especially for non-sequential groups of points, but opening a file
and moving a file pointer should be faster than fetching (am I right
on this?)
So I'm curious if anyone has experimented with this type of setup for
large data sets, if you have any opinion on the tack I'm taking, and
if there are any bugaboos to watch out for not covered in the CD
performance docs.
Also - is there a strong enough incentive to give up Core Data so
that I can use a faster database than SQLite? (possibly Valentina)
I'm loathe to do this because of the tight integration CD has with
Cocoa and IB, but I'm wondering if the scale of the data I'm using
just requires a higher performance database.
I'm also looking for advice on graphical display of these types of
time series data, i.e. fast loading into bezier paths, rapid scaling,
averaging of data points for best display, scrolling animation, etc.
if anyone would like to share or point me to the right references.
Thanks for looking,
Peter Passaro
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden