In a multi-server Striim cluster with the metadata repository hosted on Oracle or PostgreSQL, a network partition that splits the cluster into two subsets that cannot communicate with each other will cause both subsets to go into failover mode (commonly called split brain), resulting in an unpredictable variety of errors and eventually a crash.
To prevent this from happening, on each server:
ClusterQuorumSizeto just over half the number of servers in the cluster. For example, for a three-server cluster, set
ClusterQuorumSize=2; for a four-node cluster, set
By default, when the number of servers in the cluster drops below the quorum, each server will wait 60 seconds for communication to resume before shutting down. To change that timeout, set
ClusterHeartBeatTimeoutto the desired number of seconds.
Then restart all servers (see Starting and stopping Striim).