Re: Batch inserts
Re: Batch inserts
- Subject: Re: Batch inserts
- From: Charles Koppelman <email@hidden>
- Date: Wed, 19 Dec 2007 15:15:36 -0500
- Thread-topic: Batch inserts
Title: Re: Batch inserts
Thanks! This is great. I’ll try it out!
On 12/19/07 3:01 PM, "Kieran Kelleher" <email@hidden> wrote:
And BTW, these massive insert operations are usually done in a background thread (either autonomous or with LongResponse meta-refreshing page) and IIRC (without looking at my code but I usually have one or a pool of OSC's just for EOF intensive background threads) I am probably using a different ObjectStoreCoordinator to the regular OSC(s) being used for user R-R handling. Using the default OSC for a few 100 thousand may affect application responsiveness.
On Dec 19, 2007, at 1:32 PM, Kieran Kelleher wrote:
Do them one at a time using adaptor operations in a loop ( I regularly run some processes that insert 500,000 to 1,000,000 rows using this method and it is pretty fast. I forget what the numbers were, but it is acceptable. The code pattern was copied from ERX's updateRows and ChannelOperation, so the credit goes to Anjo for the channel handling logic. (see below this snippet for primary key handling.....). Anyway, copy these two methods to a static class and try it. Test it with a millisecond timer before you discount it as too slow. For me it is plenty fast enough ...... I forgot how much faster than regular editing context inserting and saving .... maybe 200 times faster or something ... I forget.......... also at the bottom of the email I have pasted in a method from a piece of code that uses these two static utility methods just to give you an idea of usage .... this is code from one of my apps, but it will give you an idea of how to do what you want ........ I colored the snippet where the work happens in red (not sure if mailing list will carry that or not)
HTH, Kieran
/**
* Inserts a raw row. You are responsible to ensure it has a new valid primary key
* @see EOAdaptorChannel#insertRow(NSDictionary, EOEntity)
* @param ec
* @param rawRow
* @param entityName
* @return 1
*/
public static int insertRawRowForEntity(EOEditingContext ec, NSDictionary rawRow, String entityName){
EOEntity entity = entityNamed(entityName);
String modelName = entity.model().name();
boolean wasOpen = true;
EOAdaptorChannel channel = null;
int rows = 0;
ec.lock();
try {
EODatabaseContext dbc = EOUtilities.databaseContextForModelNamed(ec, modelName);
dbc.lock();
try {
channel = dbc.availableChannel().adaptorChannel();
wasOpen = channel.isOpen();
if(!wasOpen) {
channel.openChannel();
}
channel.adaptorContext().beginTransaction();
try {
channel.insertRow(rawRow, entity);
channel.adaptorContext().commitTransaction();
} catch(RuntimeException ex) {
channel.adaptorContext().rollbackTransaction();
throw ex;
}
} finally {
if(!wasOpen) {
channel.closeChannel();
}
dbc.unlock();
}
} finally {
ec.unlock();
}
// TODO : The JDBCChannel concrete subclass has a rowsProcessedCount, so perhaps use that
return 1;
}
Before starting the batch insert, I grab an array of primary keys based on how many I need and just pull them form the array as I loop thru inserting.
/**
* @param ec
* @param entityName
* @return
* @see com.webobjects.eoaccess.EOAdaptorChannel#primaryKeysForNewRowsWithEntity(int, EOEntity)
*/
public static NSArray primaryKeysForNewRowsWithEntity(EOEditingContext ec, int count, String entityName) {
EOEntity entity = entityNamed(entityName);
String modelName = entity.model().name();
NSArray primaryKeysArray = null;
boolean wasOpen = true;
EOAdaptorChannel channel = null;
int rows = 0;
ec.lock();
try {
EODatabaseContext dbc = EOUtilities.databaseContextForModelNamed(ec, modelName);
dbc.lock();
try {
channel = dbc.availableChannel().adaptorChannel();
wasOpen = channel.isOpen();
if(!wasOpen) {
channel.openChannel();
}
channel.adaptorContext().beginTransaction();
try {
primaryKeysArray = channel.primaryKeysForNewRowsWithEntity(count, entity);
channel.adaptorContext().commitTransaction();
} catch(RuntimeException ex) {
channel.adaptorContext().rollbackTransaction();
throw ex;
}
} finally {
if(!wasOpen) {
channel.closeChannel();
}
dbc.unlock();
}
} finally {
ec.unlock();
}
return primaryKeysArray;
}
****** EXAMPLE OF USAGE (lifted from an existing app) **************
/**
* OK, the EO direction was way too slow. Here is the faster more "bare metal" approach.
* First simply deleteRowsDescribedByQualifier.
* Then fetch all geoPoints.
* Then insertRows
* All done with creating EO's only because we need the speed performance
* due to having 10's of thousands of rows per each geospec
* TODO: Logic to update a radius by comparing to geoPoints and setting all isOverlapExclude to false
*/
public void resetGeoPointsInRadius(){
if (isNewObject()) {
throw new RuntimeException("We cannot perform raw row operations on related geospatial points for a CTGeographySpec that is not yet saved"
+ ". The owner object in question is " + ownerObject());
} //~ if (isNewObject())
StopWatch stopWatch = new StopWatch();
stopWatch.start();
// First delete all related geopoints
int rowsAffected = ERXEOAccessUtilities.deleteRowsDescribedByQualifier(editingContext(), CTGeoSpatialPoint.ENTITY_NAME, geoSpatialPointsQualifier());
if (log.isDebugEnabled())
log.debug("DELETE rowsAffected = " + rowsAffected);
stopWatch.split();
if (log.isDebugEnabled())
log.debug("split time after deleting old geospatial points = " + stopWatch);
stopWatch.unsplit();
// Sometimes we cannot get a geocoded centroid, so check this first
if (centroidGeoPoint() != null) {
// Recreate all related items in radius
NSArray geoPointRawRows = GeoPoint.Utilities.pointsInRadius(centroidGeoPoint(), radiusMiles(), geoPointTypeEntityName());
// For speed requirements, we insert geospatial points directly as raw rows
Object oidGeographySpec = rawPrimaryKey();
Number geoPointType = geoPointType();
Integer isOverLapExcludeDefault = new Integer(0);
// grab primary keys for the new rows
NSArray geoSpatialPointPrimaryKeys = WKEOUtils.primaryKeysForNewRowsWithEntity(editingContext(), geoPointRawRows.count(), CTGeoSpatialPoint.ENTITY_NAME);
for (int i = 0; i < geoPointRawRows.count(); i++) {
NSDictionary currentGeoPointRawRow = (NSDictionary) geoPointRawRows.objectAtIndex(i);
NSMutableDictionary geoSpatialRawRow = new NSMutableDictionary();
geoSpatialRawRow.takeValueForKey(oidGeographySpec, CTGeoSpatialPoint.KEY_OID_GEOGRAPHY_SPEC);
geoSpatialRawRow.takeValueForKey(currentGeoPointRawRow.valueForKey("oid"), CTGeoSpatialPoint.KEY_OID_GEO_POINT);
geoSpatialRawRow.takeValueForKey(geoPointType, CTGeoSpatialPoint.KEY_GEO_POINT_TYPE);
geoSpatialRawRow.takeValueForKey(currentGeoPointRawRow.valueForKey("distance"), CTGeoSpatialPoint.KEY_DISTANCE);
geoSpatialRawRow.takeValueForKey(isOverLapExcludeDefault, CTGeoSpatialPoint.KEY_IS_OVERLAP_EXCLUDE);
NSDictionary pkDict = (NSDictionary) geoSpatialPointPrimaryKeys.objectAtIndex(i);
// Merge the pk dict
geoSpatialRawRow.addEntriesFromDictionary(pkDict);
if (log.isDebugEnabled() && i < 10) {
log.debug("geoSpatialRawRow = " + geoSpatialRawRow);
}
WKEOUtils.insertRawRowForEntity(editingContext(), geoSpatialRawRow, CTGeoSpatialPoint.ENTITY_NAME);
}
stopWatch.stop();
if (log.isDebugEnabled())
log.debug("Time to reset program geospec for " + program() + " = " + stopWatch);
} else {
if (log.isDebugEnabled())
log.debug("Could not get centroidGeoPoint for " + this);
} //~ if (centroidGeoPoint() != null)
}
On Dec 19, 2007, at 12:45 PM, Charles Koppelman wrote:
So ERX provides us with a great way to do massive updates
(ERXEOAccessUtilities.updateRowsDescribedByQualifier). Is there a way that
anyone on this list has come up with to do massive INSERTs?
Looking at the code, it seems that inserts are done exclusively through
insertRow methods, meaning one insert statement per row. However, SQL
supports multiple inserts into the same table like:
INSERT INTO MY_TABLE
(ID, COLUMN_A, COLUMN_B)
VALUES (1, 'A', 'B'),
(2, 'C', 'D'),
(3, 'E', 'F');
Has anyone implemented this?
Charles Koppelman
Programmer
Youth For Understanding, USA
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden