Add documentation on durability and repair procedures (fix #219)
This commit is contained in:
parent
3aadba724d
commit
9233661967
114
doc/book/cookbook/durability-repairs.md
Normal file
114
doc/book/cookbook/durability-repairs.md
Normal file
@ -0,0 +1,114 @@
|
||||
+++
|
||||
title = "Durability & Repairs"
|
||||
weight = 50
|
||||
+++
|
||||
|
||||
To ensure the best durability of your data and to fix any inconsistencies that may
|
||||
pop up in a distributed system, Garage provides a serires of repair operations.
|
||||
This guide will explain the meaning of each of them and when they should be applied.
|
||||
|
||||
|
||||
# General syntax of repair operations
|
||||
|
||||
Repair operations described below are of the form `garage repair <repair_name>`.
|
||||
These repairs will not launch without the `--yes` flag, which should
|
||||
be added as follows: `garage repair --yes <repair_name>`.
|
||||
By default these repair procedures will only run on the Garage node your CLI is
|
||||
connecting to. To run on all nodes, add the `-a` flag as follows:
|
||||
`garage repair -a --yes <repair_name>`.
|
||||
|
||||
# Data block operations
|
||||
|
||||
## Data store scrub
|
||||
|
||||
Scrubbing the data store means examining each individual data block to check that
|
||||
their content is correct, by verifying their hash. Any block found to be corrupted
|
||||
(e.g. by bitrot or by an accidental manipulation of the datastore) will be
|
||||
restored from another node that holds a valid copy.
|
||||
|
||||
A scrub is run automatically by Garage every 30 days. It can also be launched
|
||||
manually using `garage repair scrub start`.
|
||||
|
||||
To view the status of an ongoing scrub, first find the task ID of the scrub worker
|
||||
using `garage worker list`. Then, run `garage worker info <scrub_task_id>` to
|
||||
view detailed runtime statistics of the scrub. To gather cluster-wide information,
|
||||
this command has to be run on each individual node.
|
||||
|
||||
A scrub is a very disk-intensive operation that might slow down your cluster.
|
||||
You may pause an ongoing scrub using `garage repair scrub pause`, but note that
|
||||
the scrub will resume automatically 24 hours later as Garage will not let your
|
||||
cluster run without a regular scrub. If the scrub procedure is too intensive
|
||||
for your servers and is slowing down your workload, the recommended solution
|
||||
is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`.
|
||||
A higher tranquility value will make Garage take longer pauses between two block
|
||||
verifications. Of course, scrubbing the entire data store will also take longer.
|
||||
|
||||
## Block check and resync
|
||||
|
||||
In some cases, nodes hold a reference to a block but do not actually have the block
|
||||
stored on disk. Conversely, they may also have on disk blocks that are not referenced
|
||||
any more. To fix both cases, a block repair may be run with `garage repair blocks`.
|
||||
This will scan the entire block reference counter table to check that the blocks
|
||||
exist on disk, and will scan the entire disk store to check that stored blocks
|
||||
are referenced.
|
||||
|
||||
It is recommended to run this procedure when changing your cluster layout,
|
||||
after the metadata tables have finished synchronizing between nodes
|
||||
(usually a few hours after `garage layout apply`).
|
||||
|
||||
## Inspecting lost blocks
|
||||
|
||||
In extremely rare situations, data blocks may be unavailable from the entire cluster.
|
||||
This means that even using `garage repair blocks`, some nodes may be unable
|
||||
to fetch data blocks for which they hold a reference.
|
||||
|
||||
These errors are stored on each node in a list of "block resync errors", i.e.
|
||||
blocks for which the last resync operation failed.
|
||||
This list can be inspected using `garage block list-errors`.
|
||||
These errors usually fall into one of the following categories:
|
||||
|
||||
1. a block is still referenced but the object was deleted, this is a case
|
||||
of metadata reference inconsistency (see below for the fix)
|
||||
2. a block is referenced by a non-deleted object, but could not be fetched due
|
||||
to a transient error such as a network failure
|
||||
3. a block is referenced by a non-deleted object, but could not be fetched due
|
||||
to a permanent error such as there not being any valid copy of the block on the
|
||||
entire cluster
|
||||
|
||||
To help make the difference between cases 1 and cases 2 and 3, you may use the
|
||||
`garage block info` command to see which objects hold a reference to each block.
|
||||
|
||||
In the second case (transient errors), Garage will try to fetch the block again
|
||||
after a certain time, so the error should disappear natuarlly. You can also
|
||||
request Garage to try to fetch the block immediately using `garage block retry-now`
|
||||
if you have fixed the transient issue.
|
||||
|
||||
If you are confident that you are in the third scenario and that your data block
|
||||
is definitely lost, then there is no other choice than to declare your S3 objects
|
||||
as unrecoverable, and to delete them properly from the data store. This can be done
|
||||
using the `garage block purge` command.
|
||||
|
||||
|
||||
# Metadata operations
|
||||
|
||||
## Metadata table resync
|
||||
|
||||
Garage automatically resyncs all entries stored in the metadata tables every hour,
|
||||
to ensure that all nodes have the most up-to-date version of all the information
|
||||
they should be holding.
|
||||
The resync procedure is based on a Merkle tree that allows to efficiently find
|
||||
differences between nodes.
|
||||
|
||||
In some special cases, e.g. before an upgrade, you might want to run a table
|
||||
resync manually. This can be done using `garage repair tables`.
|
||||
|
||||
## Metadata table reference fixes
|
||||
|
||||
In some very rare cases where nodes are unavailable, some references between objects
|
||||
are broken. For instance, if an object is deleted, the underlying versions or data
|
||||
blocks may still be held by Garage. If you suspect that such corruption has occurred
|
||||
in your cluster, you can run one of the following repair procedures:
|
||||
|
||||
- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version
|
||||
- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected)
|
||||
|
@ -1,6 +1,6 @@
|
||||
+++
|
||||
title = "Recovering from failures"
|
||||
weight = 50
|
||||
weight = 60
|
||||
+++
|
||||
|
||||
Garage is meant to work on old, second-hand hardware.
|
||||
|
@ -1,6 +1,6 @@
|
||||
+++
|
||||
title = "Upgrading Garage"
|
||||
weight = 60
|
||||
weight = 70
|
||||
+++
|
||||
|
||||
Garage is a stateful clustered application, where all nodes are communicating together and share data structures.
|
||||
|
Loading…
Reference in New Issue
Block a user