Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Optimizing Core Data for large time series

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Optimizing Core Data for large time series

Subject: Re: Optimizing Core Data for large time series
From: Aurélien Hugelé <email@hidden>
Date: Tue, 08 May 2007 15:59:55 +0200


On 8 mai 07, at 13:37, Peter Passaro wrote:

Hi All,
I'm experimenting with Core Data for working with very large data sets of neurological recordings and I'm looking for any suggestions on what others think the best way to tackle the problems of database performance/data storage and retrieval might be for this type of system.

The data model is very simple, it is: Recording entity <-->> DataStream entities <-->> DataPoint entities. The recording can have up to 100s of datastreams, and the datapoints have just two parameters (voltage and timePoint) but can number into the billions of points.

I need to do three types of operations on this data: - import it by parsing the original raw data file and inserting it into my data model - display it (converting datapoints to 2D points in bezier paths) - analyze it (running algorithms that operate on each data point within a certain time range, or comparing streams against other streams)

In my initial attempt I just did the simplest thing and kept them all in a single MOC and SQL store. This is ok for small recordings, but quickly became unwieldy for more realistic data sets (store read/writes become very slow). After looking at the Apple docs on CD performance and reading some of the posts here, I started coding the next version

As you can see from my various posts, I'm sceptical about CD performances. I've myself been surprised several times. But one thing I'm pretty sure now, is that incremental changes are very fast. I mean that inserting one element (and saving the context) in a small store (i mean few objects are already written to disk) is as fast as inserting one element is a very big store (millions). So I'm surprised when you say "store read/writes *become* very slow" do you mean that it is slower and slower when you insert data? I won't say CD is fast, but at least it is smooth and fast on the incremental aspect.

Remember that saving can be slow because of your hard disk since core data wait for the complete buffer flush to return from a save (see various posts on this point, in particular Bill Bumgarner's)

I hope you're not in a loop that insert a DataPoint, then save.... because CD will then go as fast as your disk :) inserting in a loop, then saving is of course the thing to do.

These are the changes am I experimenting with today to try and speed things up:
- Splitting each stream off to its own context and store
- Converting the DataPoints into BLOBs (and removing them from the data model) and keeping them in binary files which are referenced by a new entity, DataChunk, which has parameters: fileURL, timeBegin, timeEnd, numPoints. This creates other issues because I might take a performance hit accessing individual time points for processing, especially for non-sequential groups of points, but opening a file and moving a file pointer should be faster than fetching (am I right on this?)

So I'm curious if anyone has experimented with this type of setup for large data sets, if you have any opinion on the tack I'm taking, and if there are any bugaboos to watch out for not covered in the CD performance docs.

Also - is there a strong enough incentive to give up Core Data so that I can use a faster database than SQLite? (possibly Valentina) I'm loathe to do this because of the tight integration CD has with Cocoa and IB, but I'm wondering if the scale of the data I'm using just requires a higher performance database.

I'm also looking for advice on graphical display of these types of time series data, i.e. fast loading into bezier paths, rapid scaling, averaging of data points for best display, scrolling animation, etc. if anyone would like to share or point me to the right references.
Thanks for looking,
Peter Passaro
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription: 40gumitech.com
This email sent to email@hidden


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Optimizing Core Data for large time series (From: Peter Passaro <email@hidden>)




Prev by Date:
Drop image on NSImageView in panel

Next by Date:
Re: Optimizing Core Data for large time series

Previous by thread:
Optimizing Core Data for large time series

Next by thread:
Re: Optimizing Core Data for large time series

Index(es):

Date
Thread