Merge pull request 'updates to documentation for v0.8' (#385) from doc-0.8 into main

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/385
2022-09-19 10:45:10 +02:00 · 2022-09-19 10:45:10 +02:00 · 4fba06d62e
commit 4fba06d62e
parent 89b8087ba8 1d0a610690
17 changed files with 242 additions and 101 deletions
--- a/doc/book/cookbook/exposing-websites.md
+++ b/doc/book/cookbook/exposing-websites.md
@ -5,12 +5,14 @@ weight = 25

 ## Configuring a bucket for website access

-There are two methods to expose buckets as website:
+There are three methods to expose buckets as website:

 1. using the PutBucketWebsite S3 API call, which is allowed for access keys that have the owner permission bit set

 2. from the Garage CLI, by an adminstrator of the cluster

+3. using the Garage administration API
+
 The `PutBucketWebsite` API endpoint [is documented](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html) in the official AWS docs.
 This endpoint can also be called [using `aws s3api`](https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-website.html) on the command line.
 The website configuration supported by Garage is only a subset of the possibilities on Amazon S3: redirections are not supported, only the index document and error document can be specified.
--- a/doc/book/cookbook/from-source.md
+++ b/doc/book/cookbook/from-source.md
@ -20,57 +20,76 @@ sudo apt-get update
 sudo apt-get install build-essential
 ```

-## Using source from the Gitea repository (recommended)
+## Building from source from the Gitea repository

 The primary location for Garage's source code is the
-[Gitea repository](https://git.deuxfleurs.fr/Deuxfleurs/garage).
+[Gitea repository](https://git.deuxfleurs.fr/Deuxfleurs/garage),
+which contains all of the released versions as well as the code
+for the developpement of the next version.

-Clone the repository and build Garage with the following commands:
+Clone the repository and enter it as follows:

 ```bash
 git clone https://git.deuxfleurs.fr/Deuxfleurs/garage.git
 cd garage
-cargo build
 ```

-Be careful, as this will make a debug build of Garage, which will be extremely slow!
-To make a release build, invoke `cargo build --release` (this takes much longer).
-
-The binaries built this way are found in `target/{debug,release}/garage`.
-
-## Using source from `crates.io`
-
-Garage's source code is published on `crates.io`, Rust's official package repository.
-This means you can simply ask `cargo` to download and build this source code for you:
+If you wish to build a specific version of Garage, check out the corresponding tag. For instance:

 ```bash
-cargo install garage
+git tag  				# List available tags
+git checkout v0.8.0		# Change v0.8.0 with the version you wish to build
 ```

-That's all, `garage` should be in `$HOME/.cargo/bin`.
+Otherwise you will be building a developpement build from the `main` branch
+that includes all of the changes to be released in the next version.
+Be careful that such a build might be unstable or contain bugs,
+and could be incompatible with nodes that run stable versions of Garage.

-You can add this folder to your `$PATH` or copy the binary somewhere else on your system.
-For instance:
+Finally, build Garage with the following command:

 ```bash
-sudo cp $HOME/.cargo/bin/garage /usr/local/bin/garage
+cargo build --release
 ```

+The binary built this way can now be found in `target/release/garage`.
+You may simply copy this binary to somewhere in your `$PATH` in order to
+have the `garage` command available in your shell, for instance:

-## Selecting features to activate in your build
+```bash
+sudo cp target/release/garage /usr/local/bin/garage
+```

-Garage supports a number of compilation options in the form of Cargo features,
+If you are planning to develop Garage,
+you might be interested in producing debug builds, which compile faster but run slower:
+this can be done by removing the `--release` flag, and the resulting build can then
+be found in `target/debug/garage`.
+
+## List of available Cargo feature flags
+
+Garage supports a number of compilation options in the form of Cargo feature flags,
 which can be used to provide builds adapted to your system and your use case.
-The following features are available:
+To produce a build with a given set of features, invoke the `cargo build` command
+as follows:

-| Feature | Enabled | Description |
-| ------- | ------- | ----------- |
-| `bundled-libs` | BY DEFAULT | Use bundled version of sqlite3, zstd, lmdb and libsodium |
-| `system-libs` | optional | Use system version of sqlite3, zstd, lmdb and libsodium if available (exclusive with `bundled-libs`, build using `cargo build --no-default-features --features system-libs`) |
-| `k2v` | optional | Enable the experimental K2V API (if used, all nodes on your Garage cluster must have it enabled as well) |
-| `kubernetes-discovery` | optional | Enable automatic registration and discovery of cluster nodes through the Kubernetes API |
-| `metrics` | BY DEFAULT | Enable collection of metrics in Prometheus format on the admin API |
+```bash
+# This will build the default feature set plus feature1, feature2 and feature3
+cargo build --release --features feature1,feature2,feature3
+# This will build ONLY feature1, feature2 and feature3
+cargo build --release --no-default-features \
+            --features feature1,feature2,feature3
+```
+
+The following feature flags are available in v0.8.0:
+
+| Feature flag | Enabled | Description |
+| ------------ | ------- | ----------- |
+| `bundled-libs` | *by default* | Use bundled version of sqlite3, zstd, lmdb and libsodium |
+| `system-libs` | optional | Use system version of sqlite3, zstd, lmdb and libsodium<br>if available (exclusive with `bundled-libs`, build using<br>`cargo build --no-default-features --features system-libs`) |
+| `k2v` | optional | Enable the experimental K2V API (if used, all nodes on your<br>Garage cluster must have it enabled as well) |
+| `kubernetes-discovery` | optional | Enable automatic registration and discovery<br>of cluster nodes through the Kubernetes API |
+| `metrics` | *by default* | Enable collection of metrics in Prometheus format on the admin API |
 | `telemetry-otlp` | optional | Enable collection of execution traces using OpenTelemetry |
-| `sled` | BY DEFAULT | Enable using Sled to store Garage's metadata |
+| `sled` | *by default* | Enable using Sled to store Garage's metadata |
 | `lmdb` | optional | Enable using LMDB to store Garage's metadata |
 | `sqlite` | optional | Enable using Sqlite3 to store Garage's metadata |
--- a/doc/book/design/benchmarks/index.md
+++ b/doc/book/design/benchmarks/index.md
@ -1,6 +1,6 @@
 +++
 title = "Benchmarks"
-weight = 10
+weight = 40
 +++

 With Garage, we wanted to build a software defined storage service that follow the [KISS principle](https://en.wikipedia.org/wiki/KISS_principle),
--- a/doc/book/design/goals.md
+++ b/doc/book/design/goals.md
@ -1,13 +1,13 @@
 +++
 title = "Goals and use cases"
-weight = 5
+weight = 10
 +++

 ## Goals and non-goals

 Garage is a lightweight geo-distributed data store that implements the
 [Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)
-object storage protocole. It enables applications to store large blobs such
+object storage protocol. It enables applications to store large blobs such
 as pictures, video, images, documents, etc., in a redundant multi-node
 setting. S3 is versatile enough to also be used to publish a static
 website.
--- a/doc/book/design/internals.md
+++ b/doc/book/design/internals.md
@ -20,6 +20,49 @@ In the meantime, you can find some information at the following links:
 - [an old design draft](@/documentation/working-documents/design-draft.md)


+## Request routing logic
+
+Data retrieval requests to Garage endpoints (S3 API and websites) are resolved 
+to an individual object in a bucket. Since objects are replicated to multiple nodes 
+Garage must ensure consistency before answering the request.
+
+### Using quorum to ensure consistency
+
+Garage ensures consistency by attempting to establish a quorum with the
+data nodes responsible for the object. When a majority of the data nodes
+have provided metadata on a object Garage can then answer the request.
+
+When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions:
+
+- Make a request to the two preferred nodes for object metadata
+- Try the third node if one of the two initial requests fail
+- Check that the metadata from at least 2 nodes match
+- Check that the object hasn't been marked deleted
+- Answer the request with inline data from metadata if object is small enough
+- Or get data blocks from the preferred nodes and answer using the assembled object
+
+Garage dynamically determines which nodes to query based on health, preference, and 
+which nodes actually host a given data. Garage has no concept of "primary" so any 
+healthy node with the data can be used as long as a quorum is reached for the metadata.
+
+### Node health
+
+Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection
+cannot be established, or a node fails to answer a number of pings, the target node is marked as failed.
+Failed nodes are not used for quorum or other internal requests.
+
+### Node preference
+
+Garage prioritizes which nodes to query according to a few criteria:
+
+- A node always prefers itself if it can answer the request
+- Then the node prioritizes nodes in the same zone
+- Finally the nodes with the lowest latency are prioritized 
+
+
+For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md) 
+and [cluster layout management](@/documentation/reference-manual/layout.md) pages.
+
 ## Garbage collection

 A faulty garbage collection procedure has been the cause of
--- a/doc/book/design/related-work.md
+++ b/doc/book/design/related-work.md
@ -1,6 +1,6 @@
 +++
 title = "Related work"
-weight = 15
+weight = 50
 +++

 ## Context
--- a/doc/book/quick-start/_index.md
+++ b/doc/book/quick-start/_index.md
@ -9,6 +9,15 @@ Let's start your Garage journey!
 In this chapter, we explain how to deploy Garage as a single-node server
 and how to interact with it.

+## What is Garage?
+
+Before jumping in, you might be interested in reading the following pages:
+
+- [Goals and use cases](@/documentation/design/goals.md)
+- [List of features](@/documentation/reference-manual/features.md)
+
+## Scope of this tutorial
+
 Our goal is to introduce you to Garage's workflows.
 Following this guide is recommended before moving on to
 [configuring a multi-node cluster](@/documentation/cookbook/real-world.md).
--- a/doc/book/reference-manual/admin-api.md
+++ b/doc/book/reference-manual/admin-api.md
@ -1,6 +1,6 @@
 +++
 title = "Administration API"
-weight = 16
+weight = 60
 +++

 The Garage administration API is accessible through a dedicated server whose
--- a/doc/book/reference-manual/cli.md
+++ b/doc/book/reference-manual/cli.md
@ -1,6 +1,6 @@
 +++
 title = "Garage CLI"
-weight = 15
+weight = 30
 +++

 The Garage CLI is mostly self-documented. Make use of the `help` subcommand
--- a/doc/book/reference-manual/configuration.md
+++ b/doc/book/reference-manual/configuration.md
@ -1,6 +1,6 @@
 +++
 title = "Configuration file format"
-weight = 5
+weight = 20
 +++

 Here is an example `garage.toml` configuration file that illustrates all of the possible options:
@ -10,7 +10,6 @@ metadata_dir = "/var/lib/garage/meta"
 data_dir = "/var/lib/garage/data"

 block_size = 1048576
-block_manager_background_tranquility = 2

 replication_mode = "3"

@ -87,17 +86,6 @@ files will remain available. This however means that chunks from existing files
 will not be deduplicated with chunks from newly uploaded files, meaning you
 might use more storage space that is optimally possible.

-### `block_manager_background_tranquility`
-
-This parameter tunes the activity of the background worker responsible for
-resyncing data blocks between nodes. The higher the tranquility value is set,
-the more the background worker will wait between iterations, meaning the load
-on the system (including network usage between nodes) will be reduced. The
-minimal value for this parameter is `0`, where the background worker will
-allways work at maximal throughput to resynchronize blocks. The default value
-is `2`, where the background worker will try to spend at most 1/3 of its time
-working, and 2/3 sleeping in order to reduce system load.
-
 ### `replication_mode`

 Garage supports the following replication modes:
--- a/doc/book/reference-manual/features.md
+++ b/doc/book/reference-manual/features.md
@ -0,0 +1,125 @@
+++
+title = "List of Garage features"
+weight = 10
+++
+
+
+### S3 API
+
+The main goal of Garage is to provide an object storage service that is compatible with the
+[S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) from Amazon Web Services.
+We try to adhere as strictly as possible to the semantics of the API as implemented by Amazon
+and other vendors such as Minio or CEPH.
+
+Of course Garage does not implement the full span of API endpoints that AWS S3 does;
+the exact list of S3 features implemented by Garage can be found [on our S3 compatibility page](@/documentation/reference-manual/s3-compatibility.md).
+
+### Geo-distribution
+
+Garage allows you to store copies of your data in multiple geographical locations in order to maximize resilience
+to adverse events, such as network/power outages or hardware failures.
+This allows Garage to run very well even at home, using consumer-grade Internet connectivity
+(such as FTTH) and power, as long as cluster nodes can be spawned at several physical locations.
+Garage exploits knowledge of the capacity and physical location of each storage node to design
+a storage plan that best exploits the available storage capacity while satisfying the geo-distributed replication constraint.
+
+To learn more about geo-distributed Garage clusters,
+read our documentation on [setting up a real-world deployment](@/documentation/cookbook/real-world.md).
+
+### Standalone/self-contained
+
+Garage is extremely simple to deploy, and does not depend on any external service to run.
+This makes setting up and administering storage clusters, we hope, as easy as it could be.
+
+### Flexible topology
+
+A Garage cluster can very easily evolve over time, as storage nodes are added or removed.
+Garage will automatically rebalance data between nodes as needed to ensure the desired number of copies.
+Read about cluster layout management [here](@/documentation/reference-manual/layout.md).
+
+### No RAFT slowing you down
+
+It might seem strange to tout the absence of something as a desirable feature,
+but this is in fact a very important point! Garage does not use RAFT or another
+consensus algorithm internally to order incoming requests: this means that all requests
+directed to a Garage cluster can be handled independently of one another instead
+of going through a central bottleneck (the leader node).
+As a consequence, requests can be handled much faster, even in cases where latency
+between cluster nodes is important (see our [benchmarks](@/documentation/design/benchmarks/index.md) for data on this).
+This is particularly usefull when nodes are far from one another and talk to one other through standard Internet connections.
+
+### Several replication modes
+
+Garage supports a variety of replication modes, with 1 copy, 2 copies or 3 copies of your data,
+and with various levels of consistency, in order to adapt to a variety of usage scenarios.
+Read our reference page on [supported replication modes](@/documentation/reference-manual/configuration.md#replication-mode)
+to select the replication mode best suited to your use case (hint: in most cases, `replication_mode = "3"` is what you want).
+
+### Web server for static websites
+
+A storage bucket can easily be configured to be served directly by Garage as a static web site.
+Domain names for multiple websites directly map to bucket names, making it easy to build
+a platform for your users to autonomously build and host their websites over Garage.
+Surprisingly, none of the other alternative S3 implementations we surveyed (such as Minio
+or CEPH) support publishing static websites from S3 buckets, a feature that is however
+directly inherited from S3 on AWS.
+Read more on our [dedicated documentation page](@/documentation/cookbook/exposing-websites.md).
+
+### Bucket names as aliases
+
+In Garage, a bucket may have several names, known as aliases.
+Aliases can easily be added and removed on demand:
+this allows to easily rename buckets if needed
+without having to copy all of their content, something that cannot be done on AWS.
+For buckets served as static websites, having multiple aliases for a bucket can allow
+exposing the same content under different domain names.
+
+Garage also supports bucket aliases which are local to a single user:
+this allows different users to have different buckets with the same name, thus avoiding naming collisions.
+This can be helpfull for instance if you want to write an application that creates per-user buckets with always the same name.
+
+This feature is totally invisible to S3 clients and does not break compatibility with AWS.
+
+### Cluster administration API
+
+Garage provides a fully-fledged REST API to administer your cluster programatically.
+Functionnality included in the admin API include: setting up and monitoring
+cluster nodes, managing access credentials, and managing storage buckets and bucket aliases.
+A full reference of the administration API is available [here](@/documentation/reference-manual/admin-api.md).
+
+### Metrics and traces
+
+Garage makes some internal metrics available in the Prometheus data format,
+which allows you to build interactive dashboards to visualize the load and internal state of your storage cluster.
+
+For developpers and performance-savvy administrators,
+Garage also supports exporting traces of what it does internally in OpenTelemetry format.
+This allows to monitor the time spent at various steps of the processing of requests,
+in order to detect potential performance bottlenecks.
+
+### Kubernetes and Nomad integrations
+
+Garage can automatically discover other nodes in the cluster thanks to integration
+with orchestrators such as Kubernetes and Nomad (when used with Consul).
+This eases the configuration of your cluster as it removes one step where nodes need
+to be manually connected to one another.
+
+### Support for changing IP addresses
+
+As long as all of your nodes don't thange their IP address at the same time,
+Garage should be able to tolerate nodes with changing/dynamic IP addresses,
+as nodes will regularly exchange the IP addresses of their peers and try to
+reconnect using newer addresses when existing connections are broken.
+
+### K2V API (experimental)
+
+As part of an ongoing research project, Garage can expose an experimental key/value storage API called K2V.
+K2V is made for the storage and retrieval of many small key/value pairs that need to be processed in bulk.
+This completes the S3 API with an alternative that can be used to easily store and access metadata
+related to objects stored in an S3 bucket.
+
+In the context of our research project, [Aérogramme](https://aerogramme.deuxfleurs.fr),
+K2V is used to provide metadata and log storage for operations on encrypted e-mail storage.
+
+Learn more on the specification of K2V [here](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md)
+and on how to enable it in Garage [here](@/documentation/reference-manual/k2v.md).
--- a/doc/book/reference-manual/k2v.md
+++ b/doc/book/reference-manual/k2v.md
@ -1,6 +1,6 @@
 +++
 title = "K2V"
-weight = 30
+weight = 70
 +++

 Starting with version 0.7.2, Garage introduces an optionnal feature, K2V,
--- a/doc/book/reference-manual/layout.md
+++ b/doc/book/reference-manual/layout.md
@ -1,6 +1,6 @@
 +++
 title = "Cluster layout management"
-weight = 10
+weight = 50
 +++

 The cluster layout in Garage is a table that assigns to each node a role in
--- a/doc/book/reference-manual/routing.md
+++ b/doc/book/reference-manual/routing.md
@ -1,45 +0,0 @@
-+++
-title = "Request routing logic"
-weight = 10
-+++
-
-Data retrieval requests to Garage endpoints (S3 API and websites) are resolved 
-to an individual object in a bucket. Since objects are replicated to multiple nodes 
-Garage must ensure consistency before answering the request.
-
-## Using quorum to ensure consistency
-
-Garage ensures consistency by attempting to establish a quorum with the
-data nodes responsible for the object. When a majority of the data nodes
-have provided metadata on a object Garage can then answer the request.
-
-When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions:
-
- Make a request to the two preferred nodes for object metadata
- Try the third node if one of the two initial requests fail
- Check that the metadata from at least 2 nodes match
- Check that the object hasn't been marked deleted
- Answer the request with inline data from metadata if object is small enough
- Or get data blocks from the preferred nodes and answer using the assembled object
-
-Garage dynamically determines which nodes to query based on health, preference, and 
-which nodes actually host a given data. Garage has no concept of "primary" so any 
-healthy node with the data can be used as long as a quorum is reached for the metadata.
-
-## Node health
-
-Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection
-cannot be established, or a node fails to answer a number of pings, the target node is marked as failed.
-Failed nodes are not used for quorum or other internal requests.
-
-## Node preference
-
-Garage prioritizes which nodes to query according to a few criteria:
-
- A node always prefers itself if it can answer the request
- Then the node prioritizes nodes in the same zone
- Finally the nodes with the lowest latency are prioritized 
-
-
-For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md) 
-and [cluster layout management](@/documentation/reference-manual/layout.md) pages.
--- a/doc/book/reference-manual/s3-compatibility.md
+++ b/doc/book/reference-manual/s3-compatibility.md
@ -1,6 +1,6 @@
 +++
 title = "S3 Compatibility status"
-weight = 20
+weight = 40
 +++

 ## DISCLAIMER
--- a/doc/book/working-documents/design-draft.md
+++ b/doc/book/working-documents/design-draft.md
@ -1,6 +1,6 @@
 +++
-title = "Design draft"
-weight = 25
+title = "Design draft (obsolete)"
+weight = 50
 +++

 **WARNING: this documentation is a design draft which was written before Garage's actual implementation.
--- a/doc/book/working-documents/load-balancing.md
+++ b/doc/book/working-documents/load-balancing.md
@ -1,6 +1,6 @@
 +++
-title = "Load balancing data"
-weight = 10
+title = "Load balancing data (obsolete)"
+weight = 60
 +++

 **This is being yet improved in release 0.5. The working document has not been updated yet, it still only applies to Garage 0.2 through 0.4.**