• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Optimizing Core Data for large time series
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Optimizing Core Data for large time series


  • Subject: Optimizing Core Data for large time series
  • From: Peter Passaro <email@hidden>
  • Date: Tue, 8 May 2007 12:37:04 +0100

Hi All,

I'm experimenting with Core Data for working with very large data sets of neurological recordings and I'm looking for any suggestions on what others think the best way to tackle the problems of database performance/data storage and retrieval might be for this type of system.

The data model is very simple, it is: Recording entity <-->> DataStream entities <-->> DataPoint entities. The recording can have up to 100s of datastreams, and the datapoints have just two parameters (voltage and timePoint) but can number into the billions of points.

I need to do three types of operations on this data:
- import it by parsing the original raw data file and inserting it into my data model
- display it (converting datapoints to 2D points in bezier paths)
- analyze it (running algorithms that operate on each data point within a certain time range, or comparing streams against other streams)


In my initial attempt I just did the simplest thing and kept them all in a single MOC and SQL store. This is ok for small recordings, but quickly became unwieldy for more realistic data sets (store read/ writes become very slow). After looking at the Apple docs on CD performance and reading some of the posts here, I started coding the next version

These are the changes am I experimenting with today to try and speed things up:

- Splitting each stream off to its own context and store

- Converting the DataPoints into BLOBs (and removing them from the data model) and keeping them in binary files which are referenced by a new entity, DataChunk, which has parameters: fileURL, timeBegin, timeEnd, numPoints. This creates other issues because I might take a performance hit accessing individual time points for processing, especially for non-sequential groups of points, but opening a file and moving a file pointer should be faster than fetching (am I right on this?)

So I'm curious if anyone has experimented with this type of setup for large data sets, if you have any opinion on the tack I'm taking, and if there are any bugaboos to watch out for not covered in the CD performance docs.

Also - is there a strong enough incentive to give up Core Data so that I can use a faster database than SQLite? (possibly Valentina) I'm loathe to do this because of the tight integration CD has with Cocoa and IB, but I'm wondering if the scale of the data I'm using just requires a higher performance database.

I'm also looking for advice on graphical display of these types of time series data, i.e. fast loading into bezier paths, rapid scaling, averaging of data points for best display, scrolling animation, etc. if anyone would like to share or point me to the right references.

Thanks for looking,
Peter Passaro


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: Optimizing Core Data for large time series
      • From: Kaelin Colclasure <email@hidden>
    • Re: Optimizing Core Data for large time series
      • From: AurĂ©lien HugelĂ© <email@hidden>
  • Prev by Date: Re: initWithHTML Memory leak
  • Next by Date: Drop image on NSImageView in panel
  • Previous by thread: Re: How to convert pixels to centimeters in Cocoa
  • Next by thread: Re: Optimizing Core Data for large time series
  • Index(es):
    • Date
    • Thread