diff options
author | Nasser Grainawi <nasser@codeaurora.org> | 2021-05-10 14:51:20 +0000 |
---|---|---|
committer | Gerrit Code Review <noreply-gerritcodereview@google.com> | 2021-05-10 14:51:20 +0000 |
commit | 5774ae2eb58299465916ee1f50ca05545de8b2a0 (patch) | |
tree | 9d342ced5b95cba449bcf5b85aea412b86d16dfe | |
parent | 0022a34428cf8bfe4feb0935cdd20b0257bfc8a3 (diff) | |
parent | 4205d65966b2627ea6865fc4695f23aea5111970 (diff) |
Merge "Add a cluster replication configuration section"
-rw-r--r-- | src/main/resources/Documentation/about.md | 8 | ||||
-rw-r--r-- | src/main/resources/Documentation/config.md | 53 |
2 files changed, 56 insertions, 5 deletions
diff --git a/src/main/resources/Documentation/about.md b/src/main/resources/Documentation/about.md index ded9d84..d216366 100644 --- a/src/main/resources/Documentation/about.md +++ b/src/main/resources/Documentation/about.md @@ -16,11 +16,9 @@ configuration is not recommended. It is also possible to specify a local path as replication target. This makes e.g. sense if a network share is mounted to which the repositories should be replicated. -In multi-primary scenario, any replication work which is already -in-flight or completed by the other nodes is not performed to -avoid extra work. This is because, the storage for replication -events is shared between multiple primaries.(The storage location -is specified in the config using: `replication.eventsDirectory`). +It is possible to +[configure](config.html#configuring-cluster-replication) the plugin so +that multiple primaries share the replication work approximately evenly. Replication of account data (NoteDb) ------------------------------------ diff --git a/src/main/resources/Documentation/config.md b/src/main/resources/Documentation/config.md index f4ea9d6..8843671 100644 --- a/src/main/resources/Documentation/config.md +++ b/src/main/resources/Documentation/config.md @@ -46,6 +46,59 @@ Then reload the replication plugin to pick up the new configuration: To manually trigger replication at runtime, see SSH command [start](cmd-start.md). +<a name="configuring-cluster-replication"></a> +Configuring Cluster Replication +------------------------------- + +The replication plugin is designed to allow multiple primaries in a +cluster to efficiently cooperate together. This cooperation is based on +the replication event persistence subsystem and thus the directory +pointed to by the replication.eventsDirectory config key must reside on +a shared filesystem, such as NFS, to enable this cooperation. By +default simply pointing multiple primaries to the same eventsDirectory +will enable some cooperation by preventing the same replication push +from being duplicated by more than one primary. + +To further improve cooperation across the cluster, the +replication.distributionInterval config value can be set. With +distribution enabled, the replication queues for all the nodes sharing +the same eventsDirectory will reflect approximately the same outstanding +replication work (i.e. tasks waiting in the queue). Replication pushes +which are running will continue to only be visible in the queue of the +node on which the push is actually happening. This feature not only +helps administrators get a cluster wide view of outstanding replication +tasks, it allows replication tasks triggered by one primary to be +fullfilled by another node which is less busy. + +This enhanced replication work distribution allows the amount of +replication work a cluster can handle to scale more evenly and linearly +with the amount of primaries in the cluster. Adding more nodes to a +cluster without distribution enabled will generally not allow the thread +count per remote to be reduced without impacting service levels to those +remotes. This is because without distribution, all events triggered by a +node will only be fullfilled by the node which triggered the event, even +if all the other nodes in the cluster are idle. This behavior implies +that each node should be configured in a way that allows it alone to +provide the level of service which each remote requires. However with +distribution enabled, it becomes possible to reduce the amount of +replication threads configured per remote proportionally to the amount +of nodes in the cluster while maintaining the same approximate service +level as before adding new nodes. + +Thread per remote reduction without service impacts is possible with +distribution because when configuring a node it can be expected that +other nodes will pick up some of the work it triggers and it no longer +needs to be configured as if it were the only node in the cluster. For +example, if a remote requires 6 threads with one node to achieve +acceptable service, it should only take 2 threads on 3 equivalently +powered nodes to provide the same service level with distribution +enabled. Scaling down of the thread requirements per remote results in a +reduced memory footprint per remote on each node in the cluster and this +enables the nodes in the cluster to now scale to handle more remotes +with the approximate same service level than without distribution. The +amount of extra supported remotes then also scales approximately +linearly with the extra nodes in a cluster. + File `replication.config` ------------------------- |