Good evening folks!
The databases I'm responsible for contain a lot of data and I find myself frequently needing to resort to boring stuff like raw row fetching to create large reports or otherwise handle a lot of data. But sometimes, even that isn't enough - an array of ten million items is difficult for any application to handle, even though the ten million objects are just NSDictionaries/raw rows. Besides - working with raw rows is no fun. I'm spoiled by years of EOF-y goodness.
So, yesterday I wrote the attached class to handle massive amounts of data. It is by no means perfect - if you have a table of ten million rows, the primary keys for these rows are all fetched from the DB, creating quite an array (if anyone has a solution for that, I'd *love* to hear it).
It exports an entire table of roughly 2.000.000 rows from a 10 column DB table (creating a 500MB text file) in roughly four minutes on my MacBook Pro using a heap size of 400M. And this is an example of how you use it (the implementation of KMExportOperation is left as an exercise ;-):
public WOActionResults batchFetchAction() { EOEditingContext ec = ERXEC.newEditingContext(); KMMassiveOperation.Operation operation = new KMExportOperation( "/tmp/exported.csv", "\t", "\n", "UTF-8" ); KMMassiveOperation.start( ec, SomeEntity.class, null, null, operation ); return new WOResponse(); }
Anyway, I would love to hear how other folks are handling huge datasets. I would love fedback on the technique I'm using, and ieas for improvement would be great. Just about the only idea I'm not open to is "just use JDBC" ;-). I've been there and I don't want to be there. That's why I'm using EOF :-).
Cheers, - Hugi
// Hugi Thordarson
|