I have used FrontBase's clustering, and pgcluster w/ postgresql, which was a complete disaster (I was able to get it out of sync fairly readily).
Hmm, okay, why was that?
In case anyone is curious, here's the script I would follow to kill pgcluster. Note that pgcluster has a load balancer sitting in front of the cluster, and can maintain who the "majority" is even with two databases (because "reachability" between cluster members doesn't matter -- just reachability from balancer to member):
1. Initial start cluster1 2. Start Replicator 3. cluster2 Recovery Start 4. Start Load Balancer 5. psql connect to load balancer 6. shutdown cluster1
7. psql query fails: FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded.
8. reexecute query works 9. cluster1 Recovery Start 10. psql queries work 11. shutdown cluster2 12. psql query fails: FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded.
13. reexecute query works 14. cluster2 recovery start 15. psql queries work 16. shutdown cluster1 17. psql query fails catastrophically WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
18. reconnect fails with "psql: Sorry, backend connection is full" 19. cluster1 recovery start start fixes the problem, but that doesn't make sense. It's like cluster2 did not rejoin the cluster in step 14
|