Merge pull request 'Documentation updates' (#587) from doc-updates into main

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/587
2023-06-14 10:57:32 +00:00 · 2023-06-14 10:57:32 +00:00 · 5e291c64b3
commit 5e291c64b3
parent 01346143ca 9092c71a01
20 changed files with 287 additions and 33 deletions
--- a/doc/book/build/_index.md
+++ b/doc/book/build/_index.md
@ -1,6 +1,6 @@
 +++
 title = "Build your own app"
-weight = 4
+weight = 40
 sort_by = "weight"
 template = "documentation.html"
 +++
--- a/doc/book/connect/_index.md
+++ b/doc/book/connect/_index.md
@ -1,6 +1,6 @@
 +++
 title = "Existing integrations"
-weight = 3
+weight = 30
 sort_by = "weight"
 template = "documentation.html"
 +++
--- a/doc/book/cookbook/_index.md
+++ b/doc/book/cookbook/_index.md
@ -1,7 +1,7 @@
 +++
 title="Cookbook"
 template = "documentation.html"
-weight = 2
+weight = 20
 sort_by = "weight"
 +++
@ -37,7 +37,3 @@ This chapter could also be referred as "Tutorials" or "Best practices".
 - **[Monitoring Garage](@/documentation/cookbook/monitoring.md)** This page
  explains the Prometheus metrics available for monitoring the Garage
  cluster/nodes.
 - **[Recovering from failures](@/documentation/cookbook/recovering.md):** Garage's first selling point is resilience
  to hardware failures. This section explains how to recover from such a failure in the
  best possible way.
--- a/doc/book/cookbook/encryption.md
+++ b/doc/book/cookbook/encryption.md
@ -0,0 +1,108 @@
 +++
 title = "Encryption"
 weight = 50
 +++
 Encryption is a recurring subject when discussing Garage. 
 Garage does not handle data encryption by itself, but many things can
 already be done with Garage's current feature set and the existing ecosystem.
 This page takes a high level approach to security in general and data encryption
 in particular.
 # Examining your need for encryption
 - Why do you want encryption in Garage?
 - What is your threat model? What are you fearing?
  - A stolen HDD?
  - A curious administrator?
  - A malicious administrator?
  - A remote attacker?
  - etc.
 - What services do you want to protect with encryption?
  - An existing application? Which one? (eg. Nextcloud)
  - An application that you are writing
 - Any expertise you may have on the subject
 This page explains what Garage provides, and how you can improve the situation by yourself
 by adding encryption at different levels.
 We would be very curious to know your needs and thougs about ideas such as
 encryption practices and things like key management, as we want Garage to be a
 serious base platform for the developpment of secure, encrypted applications.
 Do not hesitate to come talk to us if you have any thoughts or questions on the
 subject.
 # Capabilities provided by Garage
 ## Traffic is encrypted between Garage nodes
 RPCs between Garage nodes are encrypted. More specifically, contrary to many
 distributed software, it is impossible in Garage to have clear-text RPC.  We
 use the [kuska handshake](https://github.com/Kuska-ssb/handshake) library which
 implements a protocol that has been clearly reviewed, Secure ScuttleButt's
 Secret Handshake protocol.  This is why setting a `rpc_secret` is mandatory,
 and that's also why your nodes have super long identifiers.
 ## HTTP API endpoints provided by Garage are in clear text
 Adding TLS support built into Garage is not currently planned.
 ## Garage stores data in plain text on the filesystem
 Garage does not handle data encryption at rest by itself, and instead delegates
 to the user to add encryption, either at the storage layer (LUKS, etc) or on
 the client side (or both). There are no current plans to add data encryption
 directly in Garage.
 Implementing data encryption directly in Garage might make things simpler for
 end users, but also raises many more questions, especially around key
 management: for encryption of data, where could Garage get the encryption keys
 from ? If we encrypt data but keep the keys in a plaintext file next to them,
 it's useless. We probably don't want to have to manage secrets in garage as it
 would be very hard to do in a secure way. Maybe integrate with an external
 system such as Hashicorp Vault?
 # Adding data encryption using external tools
 ## Encrypting traffic between a Garage node and your client
 You have multiple options to have encryption between your client and a node:
  - Setup a reverse proxy with TLS / ACME / Let's encrypt
  - Setup a Garage gateway locally, and only contact the garage daemon on `localhost`
  - Only contact your Garage daemon over a secure, encrypted overlay network such as Wireguard
 ## Encrypting data at rest
 Protects against the following threats:
 - Stolen HDD
 Crucially, does not protect againt malicious sysadmins or remote attackers that
 might gain access to your servers.
 Methods include full-disk encryption with tools such as LUKS.
 ## Encrypting data on the client side
 Protects againt the following threats:
 - A honest-but-curious administrator
 - A malicious administrator that tries to corrupt your data
 - A remote attacker that can read your server's data
 Implementations are very specific to the various applications. Examples:
 - Matrix: uses the OLM protocol for E2EE of user messages. Media files stored
  in Matrix are probably encrypted using symmetric encryption, with a key that is
  distributed in the end-to-end encrypted message that contains the link to the object.
 - Aerogramme: use the user's password as a key to decrypt data in the user's bucket
--- a/doc/book/cookbook/monitoring.md
+++ b/doc/book/cookbook/monitoring.md
@ -49,9 +49,5 @@ add the following lines in your Prometheus scrape config:
 To visualize the scraped data in Grafana,
 you can either import our [Grafana dashboard for Garage](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/branch/main/script/telemetry/grafana-garage-dashboard-prometheus.json)
 or make your own.
 We detail below the list of exposed metrics and their meaning.
-
+The list of exported metrics is available on our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section.
 ## List of exported metrics
 See our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section.
--- a/doc/book/cookbook/real-world.md
+++ b/doc/book/cookbook/real-world.md
@ -197,6 +197,12 @@ The `garage` binary has two purposes:
 Ensure an appropriate `garage` binary (the same version as your Docker image) is available in your path.
 If your configuration file is at `/etc/garage.toml`, the `garage` binary should work with no further change.
 You can also use an alias as follows to use the Garage binary inside your docker container:
 ```bash
 alias garage="docker exec -ti <container name> /garage"
 ```
 You can test your `garage` CLI utility by running a simple command such as:
 ```bash
@ -339,7 +345,7 @@ garage layout apply
 ```
 **WARNING:** if you want to use the layout modification commands in a script,
-make sure to read [this page](@/documentation/reference-manual/layout.md) first.
+make sure to read [this page](@/documentation/operations/layout.md) first.
 ## Using your Garage cluster
--- a/doc/book/cookbook/systemd.md
+++ b/doc/book/cookbook/systemd.md
@ -33,7 +33,20 @@ NoNewPrivileges=true
 WantedBy=multi-user.target
 ```
-*A note on hardening: garage will be run as a non privileged user, its user id is dynamically allocated by systemd. It cannot access (read or write) home folders (/home, /root and /run/user), the rest of the filesystem can only be read but not written, only the path seen as /var/lib/garage is writable as seen by the service (mapped to /var/lib/private/garage on your host). Additionnaly, the process can not gain new privileges over time.*
+**A note on hardening:** Garage will be run as a non privileged user, its user
 id is dynamically allocated by systemd (set with `DynamicUser=true`). It cannot
 access (read or write) home folders (`/home`, `/root` and `/run/user`), the
 rest of the filesystem can only be read but not written, only the path seen as
 `/var/lib/garage` is writable as seen by the service. Additionnaly, the process
 can not gain new privileges over time.
 For this to work correctly, your `garage.toml` must be set with
 `metadata_dir=/var/lib/garage/meta` and `data_dir=/var/lib/garage/data`. This
 is mandatory to use the DynamicUser hardening feature of systemd, which
 autocreates these directories as virtual mapping. If the directory
 `/var/lib/garage` already exists before starting the server for the first time,
 the systemd service might not start correctly.  Note that in your host
 filesystem, Garage data will be held in `/var/lib/private/garage`.
 To start the service then automatically enable it at boot:
--- a/doc/book/design/_index.md
+++ b/doc/book/design/_index.md
@ -1,6 +1,6 @@
 +++
 title = "Design"
-weight = 6
+weight = 70
 sort_by = "weight"
 template = "documentation.html"
 +++
--- a/doc/book/design/goals.md
+++ b/doc/book/design/goals.md
@ -42,15 +42,13 @@ locations. They use Garage themselves for the following tasks:
 - As a [Matrix media backend](https://github.com/matrix-org/synapse-s3-storage-provider)
- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 proxy
+- As a Nix binary cache
 - To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 and SFTP-to-S3 proxy
 - As a backup target using `rclone` and `restic`
 - In the Drone continuous integration platform to store task logs
 - As a Nix binary cache
 - As a backup target using `rclone`
 The Deuxfleurs Garage cluster is a multi-site cluster currently composed of
-4 nodes in 2 physical locations. In the future it will be expanded to at
+9 nodes in 3 physical locations.
 least 3 physical locations to fully exploit Garage's potential for high
 availability.
--- a/doc/book/design/internals.md
+++ b/doc/book/design/internals.md
@ -61,7 +61,7 @@ Garage prioritizes which nodes to query according to a few criteria:
 For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md) 
-and [cluster layout management](@/documentation/reference-manual/layout.md) pages.
+and [cluster layout management](@/documentation/operations/layout.md) pages.
 ## Garbage collection
--- a/doc/book/development/_index.md
+++ b/doc/book/development/_index.md
@ -1,6 +1,6 @@
 +++
 title = "Development"
-weight = 7
+weight = 80
 sort_by = "weight"
 template = "documentation.html"
 +++
--- a/doc/book/operations/_index.md
+++ b/doc/book/operations/_index.md
@ -0,0 +1,23 @@
 +++
 title = "Operations & Maintenance"
 weight = 50
 sort_by = "weight"
 template = "documentation.html"
 +++
 This section contains a number of important information on how to best operate a Garage cluster,
 to ensure integrity and availability of your data:
 - **[Upgrading Garage](@/documentation/operations/upgrading.md):** General instructions on how to
  upgrade your cluster from one version to the next. Instructions specific for each version upgrade
  can bef ound in the [working documents](@/documentation/working-documents/_index.md) section.
 - **[Layout management](@/documentation/operations/layout.md):** Best practices for using the `garage layout`
  commands when adding or removing nodes from your cluster.
 - **[Durability and repairs](@/documentation/operations/durability-repairs.md):** How to check for small things
  that might be going wrong, and how to recover from such failures.
 - **[Recovering from failures](@/documentation/operations/recovering.md):** Garage's first selling point is resilience
  to hardware failures. This section explains how to recover from such a failure in the
  best possible way.
--- a/doc/book/operations/durability-repairs.md
+++ b/doc/book/operations/durability-repairs.md
@ -0,0 +1,114 @@
 +++
 title = "Durability & Repairs"
 weight = 30
 +++
 To ensure the best durability of your data and to fix any inconsistencies that may
 pop up in a distributed system, Garage provides a serires of repair operations.
 This guide will explain the meaning of each of them and when they should be applied.
 # General syntax of repair operations
 Repair operations described below are of the form `garage repair <repair_name>`.
 These repairs will not launch without the `--yes` flag, which should
 be added as follows: `garage repair --yes <repair_name>`.
 By default these repair procedures will only run on the Garage node your CLI is
 connecting to. To run on all nodes, add the `-a` flag as follows:
 `garage repair -a --yes <repair_name>`.
 # Data block operations
 ## Data store scrub
 Scrubbing the data store means examining each individual data block to check that
 their content is correct, by verifying their hash. Any block found to be corrupted
 (e.g. by bitrot or by an accidental manipulation of the datastore) will be
 restored from another node that holds a valid copy.
 A scrub is run automatically by Garage every 30 days. It can also be launched
 manually using `garage repair scrub start`.
 To view the status of an ongoing scrub, first find the task ID of the scrub worker
 using `garage worker list`. Then, run `garage worker info <scrub_task_id>` to
 view detailed runtime statistics of the scrub. To gather cluster-wide information,
 this command has to be run on each individual node.
 A scrub is a very disk-intensive operation that might slow down your cluster.
 You may pause an ongoing scrub using `garage repair scrub pause`, but note that
 the scrub will resume automatically 24 hours later as Garage will not let your
 cluster run without a regular scrub. If the scrub procedure is too intensive
 for your servers and is slowing down your workload, the recommended solution
 is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`.
 A higher tranquility value will make Garage take longer pauses between two block
 verifications. Of course, scrubbing the entire data store will also take longer.
 ## Block check and resync
 In some cases, nodes hold a reference to a block but do not actually have the block
 stored on disk. Conversely, they may also have on disk blocks that are not referenced
 any more. To fix both cases, a block repair may be run with `garage repair blocks`.
 This will scan the entire block reference counter table to check that the blocks
 exist on disk, and will scan the entire disk store to check that stored blocks
 are referenced.
 It is recommended to run this procedure when changing your cluster layout,
 after the metadata tables have finished synchronizing between nodes
 (usually a few hours after `garage layout apply`).
 ## Inspecting lost blocks
 In extremely rare situations, data blocks may be unavailable from the entire cluster.
 This means that even using `garage repair blocks`, some nodes may be unable
 to fetch data blocks for which they hold a reference.
 These errors are stored on each node in a list of "block resync errors", i.e.
 blocks for which the last resync operation failed.
 This list can be inspected using `garage block list-errors`.
 These errors usually fall into one of the following categories:
 1. a block is still referenced but the object was deleted, this is a case
   of metadata reference inconsistency (see below for the fix)
 2. a block is referenced by a non-deleted object, but could not be fetched due
   to a transient error such as a network failure
 3. a block is referenced by a non-deleted object, but could not be fetched due
   to a permanent error such as there not being any valid copy of the block on the
   entire cluster
 To help make the difference between cases 1 and cases 2 and 3, you may use the
 `garage block info` command to see which objects hold a reference to each block.
 In the second case (transient errors), Garage will try to fetch the block again
 after a certain time, so the error should disappear natuarlly. You can also
 request Garage to try to fetch the block immediately using `garage block retry-now`
 if you have fixed the transient issue.
 If you are confident that you are in the third scenario and that your data block
 is definitely lost, then there is no other choice than to declare your S3 objects
 as unrecoverable, and to delete them properly from the data store. This can be done
 using the `garage block purge` command.
 # Metadata operations
 ## Metadata table resync
 Garage automatically resyncs all entries stored in the metadata tables every hour,
 to ensure that all nodes have the most up-to-date version of all the information
 they should be holding.
 The resync procedure is based on a Merkle tree that allows to efficiently find
 differences between nodes.
 In some special cases, e.g. before an upgrade, you might want to run a table
 resync manually. This can be done using `garage repair tables`.
 ## Metadata table reference fixes
 In some very rare cases where nodes are unavailable, some references between objects
 are broken. For instance, if an object is deleted, the underlying versions or data
 blocks may still be held by Garage. If you suspect that such corruption has occurred
 in your cluster, you can run one of the following repair procedures:
 - `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version
 - `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected)
--- a/doc/book/reference-manual/layout.md
+++ b/doc/book/reference-manual/layout.md
@ -1,6 +1,6 @@
 +++
 title = "Cluster layout management"
-weight = 50
+weight = 20
 +++
 The cluster layout in Garage is a table that assigns to each node a role in
--- a/doc/book/operations/recovering.md
+++ b/doc/book/operations/recovering.md
@ -1,6 +1,6 @@
 +++
 title = "Recovering from failures"
-weight = 50
+weight = 40
 +++
 Garage is meant to work on old, second-hand hardware.
--- a/doc/book/operations/upgrading.md
+++ b/doc/book/operations/upgrading.md
@ -1,6 +1,6 @@
 +++
 title = "Upgrading Garage"
-weight = 60
+weight = 10
 +++
 Garage is a stateful clustered application, where all nodes are communicating together and share data structures.
@ -58,7 +58,7 @@ From a high level perspective, a major upgrade looks like this:
 ### Major upgarades with minimal downtime
-There is only one operation that has to be coordinated cluster-wide: the passage of one version of the internal RPC protocol to the next.
+There is only one operation that has to be coordinated cluster-wide: the switch of one version of the internal RPC protocol to the next.
 This means that an upgrade with very limited downtime can simply be performed from one major version to the next by restarting all nodes
 simultaneously in the new version.
 The downtime will simply be the time required for all nodes to stop and start again, which should be less than a minute.
--- a/doc/book/quick-start/_index.md
+++ b/doc/book/quick-start/_index.md
@ -1,6 +1,6 @@
 +++
 title = "Quick Start"
-weight = 0
+weight = 10
 sort_by = "weight"
 template = "documentation.html"
 +++
--- a/doc/book/reference-manual/_index.md
+++ b/doc/book/reference-manual/_index.md
@ -1,6 +1,6 @@
 +++
 title = "Reference Manual"
-weight = 5
+weight = 60
 sort_by = "weight"
 template = "documentation.html"
 +++
--- a/doc/book/reference-manual/features.md
+++ b/doc/book/reference-manual/features.md
@ -35,7 +35,7 @@ This makes setting up and administering storage clusters, we hope, as easy as it
 A Garage cluster can very easily evolve over time, as storage nodes are added or removed.
 Garage will automatically rebalance data between nodes as needed to ensure the desired number of copies.
-Read about cluster layout management [here](@/documentation/reference-manual/layout.md).
+Read about cluster layout management [here](@/documentation/operations/layout.md).
 ### No RAFT slowing you down
--- a/doc/book/working-documents/_index.md
+++ b/doc/book/working-documents/_index.md
@ -1,6 +1,6 @@
 +++
 title = "Working Documents"
-weight = 8
+weight = 90
 sort_by = "weight"
 template = "documentation.html"
 +++