Re: Optimizing Core Data for large time series
Re: Optimizing Core Data for large time series
- Subject: Re: Optimizing Core Data for large time series
- From: Aurélien Hugelé <email@hidden>
- Date: Tue, 08 May 2007 15:59:55 +0200
On 8 mai 07, at 13:37, Peter Passaro wrote:
Hi All,
I'm experimenting with Core Data for working with very large data
sets of neurological recordings and I'm looking for any suggestions
on what others think the best way to tackle the problems of
database performance/data storage and retrieval might be for this
type of system.
The data model is very simple, it is: Recording entity <-->>
DataStream entities <-->> DataPoint entities. The recording can
have up to 100s of datastreams, and the datapoints have just two
parameters (voltage and timePoint) but can number into the
billions of points.
I need to do three types of operations on this data:
- import it by parsing the original raw data file and inserting it
into my data model
- display it (converting datapoints to 2D points in bezier paths)
- analyze it (running algorithms that operate on each data point
within a certain time range, or comparing streams against other
streams)
In my initial attempt I just did the simplest thing and kept them
all in a single MOC and SQL store. This is ok for small recordings,
but quickly became unwieldy for more realistic data sets (store
read/writes become very slow). After looking at the Apple docs on
CD performance and reading some of the posts here, I started coding
the next version
As you can see from my various posts, I'm sceptical about CD
performances. I've myself been surprised several times. But one thing
I'm pretty sure now, is that incremental changes are very fast. I
mean that inserting one element (and saving the context) in a small
store (i mean few objects are already written to disk) is as fast as
inserting one element is a very big store (millions). So I'm
surprised when you say "store read/writes *become* very slow" do you
mean that it is slower and slower when you insert data?
I won't say CD is fast, but at least it is smooth and fast on the
incremental aspect.
Remember that saving can be slow because of your hard disk since core
data wait for the complete buffer flush to return from a save (see
various posts on this point, in particular Bill Bumgarner's)
I hope you're not in a loop that insert a DataPoint, then save....
because CD will then go as fast as your disk :)
inserting in a loop, then saving is of course the thing to do.
These are the changes am I experimenting with today to try and
speed things up:
- Splitting each stream off to its own context and store
- Converting the DataPoints into BLOBs (and removing them from the
data model) and keeping them in binary files which are referenced
by a new entity, DataChunk, which has parameters: fileURL,
timeBegin, timeEnd, numPoints. This creates other issues because I
might take a performance hit accessing individual time points for
processing, especially for non-sequential groups of points, but
opening a file and moving a file pointer should be faster than
fetching (am I right on this?)
So I'm curious if anyone has experimented with this type of setup
for large data sets, if you have any opinion on the tack I'm
taking, and if there are any bugaboos to watch out for not covered
in the CD performance docs.
Also - is there a strong enough incentive to give up Core Data so
that I can use a faster database than SQLite? (possibly Valentina)
I'm loathe to do this because of the tight integration CD has with
Cocoa and IB, but I'm wondering if the scale of the data I'm using
just requires a higher performance database.
I'm also looking for advice on graphical display of these types of
time series data, i.e. fast loading into bezier paths, rapid
scaling, averaging of data points for best display, scrolling
animation, etc. if anyone would like to share or point me to the
right references.
Thanks for looking,
Peter Passaro
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
40gumitech.com
This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden