New replication modes and their documentation

2022-03-28 16:20:15 +02:00 · 2022-03-28 16:20:15 +02:00 · 0091002ef2
commit 0091002ef2
parent 8f9cf3a5d1
2 changed files with 81 additions and 19 deletions
--- a/doc/book/reference-manual/configuration.md
+++ b/doc/book/reference-manual/configuration.md
@ -48,7 +48,6 @@ root_domain = ".web.garage"
 [admin]
 api_bind_addr = "0.0.0.0:3903"
 trace_sink = "http://localhost:4317"
 ```
 The following gives details about each available configuration option.
@ -89,20 +88,47 @@ might use more storage space that is optimally possible.
 Garage supports the following replication modes:
- `none` or `1`: data stored on Garage is stored on a single node. There is no redundancy,
+- `none` or `1`: data stored on Garage is stored on a single node. There is no
-  and data will be unavailable as soon as one node fails or its network is disconnected.
+  redundancy, and data will be unavailable as soon as one node fails or its
-  Do not use this for anything else than test deployments.
+  network is disconnected.  Do not use this for anything else than test
  deployments.
- `2`: data stored on Garage will be stored on two different nodes, if possible in different
+- `2`: data stored on Garage will be stored on two different nodes, if possible
-  zones. Garage tolerates one node failure before losing data. Data should be available
+  in different zones. Garage tolerates one node failure, or several nodes
-  read-only when one node is down, but write operations will fail.
+  failing but all in a single zone (in a deployment with at least two zones),
-  Use this only if you really have to.
+  before losing data. Data remains available in read-only mode when one node is
  down, but write operations will fail.
- `3`: data stored on Garage will be stored on three different nodes, if possible each in
+  - `2-dangerous`: a variant of mode `2`, where written objects are written to
-  a different zones.
+    the second replica asynchronously. This means that Garage will return `200
-  Garage tolerates two node failure before losing data. Data should be available
+    OK` to a PutObject request before the second copy is fully written (or even
-  read-only when two nodes are down, and writes should be possible if only a single node
+    before it even starts being written).  This means that data can more easily
-  is down.
+    be lost if the node crashes before a second copy can be completed.  This
    also means that written objects might not be visible immediately in read
    operations.  In other words, this mode severely breaks the consistency and
    durability guarantees of standard Garage cluster operation.  Benefits of
    this mode: you can still write to your cluster when one node is
    unavailable.
 - `3`: data stored on Garage will be stored on three different nodes, if
  possible each in a different zones.  Garage tolerates two node failure, or
  several node failures but in no more than two zones (in a deployment with at
  least three zones), before losing data. As long as only a single node fails,
  or node failures are only in a single zone, reading and writing data to
  Garage can continue normally.
  - `3-degraded`: a variant of replication mode `3`, that lowers the read
    quorum to `1`, to allow you to read data from your cluster when several
    nodes (or nodes in several zones) are unavailable.  In this mode, Garage
    does not provide read-after-write consistency anymore.  The write quorum is
    still 2, ensuring that data successfully written to Garage is stored on at
    least two nodes.
  - `3-dangerous`: a variant of replication mode `3` that lowers both the read
    and write quorums to `1`, to allow you to both read and write to your
    cluster when several nodes (or nodes in several zones) are unavailable.  It
    is the least consistent mode of operation proposed by Garage, and also one
    that should probably never be used.
 Note that in modes `2` and `3`,
 if at least the same number of zones are available, an arbitrary number of failures in 
@ -111,8 +137,35 @@ any given zone is tolerated as copies of data will be spread over several zones.
 **Make sure `replication_mode` is the same in the configuration files of all nodes.
 Never run a Garage cluster where that is not the case.**
-Changing the `replication_mode` of a cluster might work (make sure to shut down all nodes
+The quorums associated with each replication mode are described below:
-and changing it everywhere at the time), but is not officially supported.
+
 | `replication_mode` | Number of replicas | Write quorum | Read quorum | Read-after-write consistency? |
 | ------------------ | ------------------ | ------------ | ----------- | ----------------------------- |
 | `none` or `1`      | 1                  | 1            | 1           | yes                           |
 | `2`                | 2                  | 2            | 1           | yes                           |
 | `2-dangerous`      | 2                  | 1            | 1           | NO                            |
 | `3`                | 3                  | 2            | 2           | yes                           |
 | `3-degraded`       | 3                  | 2            | 1           | NO                            |
 | `3-dangerous`      | 3                  | 1            | 1           | NO                            |
 Changing the `replication_mode` between modes with the same number of replicas
 (e.g. from `3` to `3-degraded`, or from `2-dangerous` to `2`), can be done easily by
 just changing the `replication_mode` parameter in your config files and restarting all your
 Garage nodes.
 It is also technically possible to change the replication mode to a mode with a
 different numbers of replicas, although it's a dangerous operation that is not
 officially supported.  This requires you to delete the existing cluster layout
 and create a new layout from scratch, meaning that a full rebalancing of your
 cluster's data will be needed.  To do it, shut down your cluster entirely,
 delete the `custer_layout` files in the meta directories of all your nodes,
 update all your configuration files with the new `replication_mode` parameter,
 restart your cluster, and then create a new layout with all the nodes you want
 to keep.  Rebalancing data will take some time, and data might temporarily
 appear unavailable to your users.  It is recommended to shut down public access
 to the cluster while rebalancing is in progress.  In theory, no data should be
 lost as rebalancing is a routine operation for Garage, although we cannot
 guarantee you that everything will go right in such an extreme scenario.
 ### `compression_level`
--- a/src/table/replication/mode.rs
+++ b/src/table/replication/mode.rs
@ -1,7 +1,10 @@
 pub enum ReplicationMode {
 	None,
 	TwoWay,
 	TwoWayDangerous,
 	ThreeWay,
 	ThreeWayDegraded,
 	ThreeWayDangerous,
 }
 impl ReplicationMode {
@ -9,7 +12,10 @@ impl ReplicationMode {
 		match v {
 			"none" | "1" => Some(Self::None),
 			"2" => Some(Self::TwoWay),
 			"2-dangerous" => Some(Self::TwoWayDangerous),
 			"3" => Some(Self::ThreeWay),
 			"3-degraded" => Some(Self::ThreeWayDegraded),
 			"3-dangerous" => Some(Self::ThreeWayDangerous),
 			_ => None,
 		}
 	}
@ -24,16 +30,17 @@ impl ReplicationMode {
 	pub fn replication_factor(&self) -> usize {
 		match self {
 			Self::None => 1,
-			Self::TwoWay => 2,
+			Self::TwoWay | Self::TwoWayDangerous => 2,
-			Self::ThreeWay => 3,
+			Self::ThreeWay | Self::ThreeWayDegraded | Self::ThreeWayDangerous => 3,
 		}
 	}
 	pub fn read_quorum(&self) -> usize {
 		match self {
 			Self::None => 1,
-			Self::TwoWay => 1,
+			Self::TwoWay | Self::TwoWayDangerous => 1,
 			Self::ThreeWay => 2,
 			Self::ThreeWayDegraded | Self::ThreeWayDangerous => 1,
 		}
 	}
@ -41,7 +48,9 @@ impl ReplicationMode {
 		match self {
 			Self::None => 1,
 			Self::TwoWay => 2,
-			Self::ThreeWay => 2,
+			Self::TwoWayDangerous => 1,
 			Self::ThreeWay | Self::ThreeWayDegraded => 2,
 			Self::ThreeWayDangerous => 1,
 		}
 	}
 }