Work on structure + Getting started is reworked

This commit is contained in:
Quentin Dufour 2021-03-17 15:44:29 +01:00
parent 0afc701a69
commit c50113acf3
7 changed files with 151 additions and 103 deletions

View File

@ -8,19 +8,19 @@
- [Create buckets and keys](./getting_started/bucket.md)
- [Handle files](./getting_started/files.md)
# Cookbooks
- [Cookbooks]()
- [Host a website](./website.md)
- [Integrate as a media backend]()
- [Operate a cluster]()
- [Host a website](./website.md)
- [Reference Manual]()
- [Garage CLI]()
- [S3 API](./compatibility.md)
# Reference Manual
- [Design]()
- [Related Work](./related_work.md)
- [Internals](./internals.md)
- [S3 API](./compatibility.md)
# Design
- [Related Work](./related_work.md)
- [Internals](./internals.md)
# Development
- [Setup your environment](./devenv.md)
- [Development]()
- [Setup your environment](./devenv.md)
- [Your first contribution]()

View File

@ -1,4 +1,4 @@
# Gettin Started
# Getting Started
Let's start your Garage journey!
In this chapter, we explain how to deploy a simple garage cluster and start interacting with it.

View File

@ -1 +1,71 @@
# Create buckets and keys
First, chances are that your garage deployment is secured by TLS.
All your commands must be prefixed with their certificates.
I will define an alias once and for all to ease future commands.
Please adapt the path of the binary and certificates to your installation!
```
alias grg="/garage/garage --ca-cert /secrets/garage-ca.crt --client-cert /secrets/garage.crt --client-key /secrets/garage.key"
```
Now we can check that everything is going well by checking our cluster status:
```
grg status
```
Don't forget that `help` command and `--help` subcommands can help you anywhere, the CLI tool is self-documented! Two examples:
```
grg help
grg bucket allow --help
```
Fine, now let's create a bucket (we imagine that you want to deploy nextcloud):
```
grg bucket create nextcloud-bucket
```
Check that everything went well:
```
grg bucket list
grg bucket info nextcloud-bucket
```
Now we will generate an API key to access this bucket.
Note that API keys are independent of buckets: one key can access multiple buckets, multiple keys can access one bucket.
Now, let's start by creating a key only for our PHP application:
```
grg key new --name nextcloud-app-key
```
You will have the following output (this one is fake, `key_id` and `secret_key` were generated with the openssl CLI tool):
```
Key { key_id: "GK3515373e4c851ebaad366558", secret_key: "7d37d093435a41f2aab8f13c19ba067d9776c90215f56614adad6ece597dbb34", name: "nextcloud-app-key", name_timestamp: 1603280506694, deleted: false, authorized_buckets: [] }
```
Check that everything works as intended (be careful, info works only with your key identifier and not with its friendly name!):
```
grg key list
grg key info GK3515373e4c851ebaad366558
```
Now that we have a bucket and a key, we need to give permissions to the key on the bucket!
```
grg bucket allow --read --write nextcloud-bucket --key GK3515373e4c851ebaad366558
```
You can check at any times allowed keys on your bucket with:
```
grg bucket info nextcloud-bucket
```

View File

@ -1,3 +1,25 @@
# Installation
Currently, only two installations procedures are supported for Garage: from Docker (x86\_64 for Linux) and from source.
In the future, we plan to add a third one, by publishing a compiled binary (x86\_64 for Linux).
We did not test other architecture/operating system but, as long as your architecture/operating system is supported by Rust, you should be able to run Garage (feel free to report your tests!).
## From Docker
Garage is a software that can be run only in a cluster and requires at least 3 instances.
If you plan to run the 3 instances on your machine for test purposes, we recommend a **docker-compose** deployment.
If you have 3 independent machines (or 3 VM on independent machines) that can communite together, a **simple docker** deployment is enough.
In any case, you first need to pick a Docker image version.
Our docker image is currently named `lxpz/garage_amd64` and is stored on the [Docker Hub](https://hub.docker.com/r/lxpz/garage_amd64/tags?page=1&ordering=last_updated).
We encourage you to use a fixed tag (eg. `v0.1.1d`) and not the `latest` tag.
For this example, we will use the latest published version at the time of the writing which is `v0.1.1d` but it's up to you
to check [the most recent versions on the Docker Hub](https://hub.docker.com/r/lxpz/garage_amd64/tags?page=1&ordering=last_updated).
### Single machine deployment with docker-compose
### Multiple machine deployment with docker
## From source

View File

@ -10,17 +10,17 @@ To promote better data management policies, with focused on the following desira
- **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target hyperconverged infrastructures
- **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures
- **Simple**: simple to understand, simple to operate, simple to debug
- **Internet enabled**: Made for multi-sites (eg. datacenter, offices, etc.) interconnected through a regular internet connection.
- **Internet enabled**: made for multi-sites (eg. datacenter, offices, etc.) interconnected through a regular internet connection.
We also noted that the pursuit of some other goals are detrimental to our initial goals.
The following have been identified has non-goals, if it matters to you, you should not use Garage:
- **Extreme performances**: high performances constrain a lot the design and the deployment. We always prioritize
- **Feature extensiveness**: Complete implementation of the S3 API
- **Storage optimizations**: Erasure coding (our replication model is simply to copy the data as is on several nodes, in different datacenters if possible)
- **POSIX/Filesystem compatibility**: We do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such syncronizations are translated in network messages that impose severe constraints on the deployment.
- **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only.
- **Feature extensiveness**: complete implementation of the S3 API or any other API to make garage a drop-in replacement is not targeted as it could lead to decisions impacting our desirable properties.
- **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication.
- **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such syncronizations are translated in network messages that impose severe constraints on the deployment.
## Integration in environments
## Supported and planned protocols
Garage speaks (or will speak) the following protocols:
@ -36,9 +36,9 @@ We plan to add logic to Garage to make it a viable solution for email storage.
Especially, it is used to host their main website, this documentation and some of its members's blogs. Additionally,
Garage is used as a [backend for Nextcloud](https://docs.nextcloud.com/server/20/admin_manual/configuration_files/primary_storage.html). Deuxfleurs also plans to use Garage as their [Matrix's media backend](https://github.com/matrix-org/synapse-s3-storage-provider) and has the backend of [OCIS](https://github.com/owncloud/ocis).
*Are you using Garage? Open a pull request to add your organization here!*
*Are you using Garage? [Open a pull request](https://git.deuxfleurs.fr/Deuxfleurs/garage/) to add your organization here!*
## Comparisons to existing software
## Comparison to existing software
**[Minio](https://min.io/) :** Minio shares our *self-contained & lightweight* goal but selected two of our non-goals: *storage optimizations* through erasure coding and *POSIX/Filesystem compatibility* through strong consistency.
However, by pursuing these two non-goals, minio do not reach our desirable properties.
@ -46,17 +46,26 @@ First, it fails on the *simple* property: due to the erasure coding, minio has s
Second, it fails on the *interned enabled* property: due to its strong consistency, minio is latency sensitive.
Furthermore, minio has no knowledge of "sites" and thus can not distribute data to minimize the failure of a given site.
**[Openstack Swift](https://docs.openstack.org/swift/latest/)**
**[Openstack Swift](https://docs.openstack.org/swift/latest/) :**
OpenStack Swift at least fails on the *self-contained & lightweight* goal.
Starting it requires around 8Gb of RAM, which is too much especially in an hyperconverged infrastructure.
It seems also to be far from *Simple*.
**[Pithos](https://github.com/exoscale/pithos)**
Pithos has been abandonned and should probably not used yet, in the following we explain why we did not pick their design.
Pithos was relying as a S3 proxy in front of Cassandra (and was working with Scylla DB too).
From its designers' mouth, storing data in Cassandra has shown its limitations justifying the project abandonment.
They built a closed-source version 2 that does not store blobs in the database (only metadata) but did not communicate further on it.
We considered there v2's design but concluded that it does not fit both our *Self-contained & lightweight* and *Simple* properties. It makes the development, the deployment and the operations more complicated while reducing the flexibility.
**[Ceph](https://ceph.io/ceph-storage/object-storage/) :**
This review holds for the whole Ceph stack, including the RADOS paper, Ceph Object Storage module, the RADOS Gateway, etc.
At is core, Ceph has been designed to provide *POSIX/Filesystem compatibility* which requires strong consistency, which in turn
makes Ceph latency sensitive and fails our *Internet enabled* goal.
Due to its industry oriented design, Ceph is also far from being *Simple* to operate and from being *self-contained & lightweight* which makes it hard to integrate it in an hyperconverged infrastructure.
In a certain way, Ceph and Minio are closer togethers than they are from Garage or OpenStack Swift.
**[IPFS](https://ipfs.io/)**
*Not written yet*
*More comparisons are available in our [Related Work](design/related_work.md) chapter.*
## Community
If you want to discuss with us, you can join our Matrix channel at [#garage:deuxfleurs.fr](https://matrix.to/#/#garage:deuxfleurs.fr).
Our code and our issue tracker, which is the place where you should report bugs, are managed on [Deuxfleurs' Gitea](https://git.deuxfleurs.fr/Deuxfleurs/garage).
## License
Garage, all the source code, is released under the [AGPL v3 License](https://www.gnu.org/licenses/agpl-3.0.en.html).
Please note that if you patch Garage and then use it to provide any service over a network, you must share your code!

View File

@ -1,71 +0,0 @@
# Configuring a cluster
First, chances are that your garage deployment is secured by TLS.
All your commands must be prefixed with their certificates.
I will define an alias once and for all to ease future commands.
Please adapt the path of the binary and certificates to your installation!
```
alias grg="/garage/garage --ca-cert /secrets/garage-ca.crt --client-cert /secrets/garage.crt --client-key /secrets/garage.key"
```
Now we can check that everything is going well by checking our cluster status:
```
grg status
```
Don't forget that `help` command and `--help` subcommands can help you anywhere, the CLI tool is self-documented! Two examples:
```
grg help
grg bucket allow --help
```
Fine, now let's create a bucket (we imagine that you want to deploy nextcloud):
```
grg bucket create nextcloud-bucket
```
Check that everything went well:
```
grg bucket list
grg bucket info nextcloud-bucket
```
Now we will generate an API key to access this bucket.
Note that API keys are independent of buckets: one key can access multiple buckets, multiple keys can access one bucket.
Now, let's start by creating a key only for our PHP application:
```
grg key new --name nextcloud-app-key
```
You will have the following output (this one is fake, `key_id` and `secret_key` were generated with the openssl CLI tool):
```
Key { key_id: "GK3515373e4c851ebaad366558", secret_key: "7d37d093435a41f2aab8f13c19ba067d9776c90215f56614adad6ece597dbb34", name: "nextcloud-app-key", name_timestamp: 1603280506694, deleted: false, authorized_buckets: [] }
```
Check that everything works as intended (be careful, info works only with your key identifier and not with its friendly name!):
```
grg key list
grg key info GK3515373e4c851ebaad366558
```
Now that we have a bucket and a key, we need to give permissions to the key on the bucket!
```
grg bucket allow --read --write nextcloud-bucket --key GK3515373e4c851ebaad366558
```
You can check at any times allowed keys on your bucket with:
```
grg bucket info nextcloud-bucket
```

View File

@ -1,3 +1,5 @@
# Related Work
## Context
Data storage is critical: it can lead to data loss if done badly and/or on hardware failure.
@ -8,7 +10,7 @@ But here we consider non specialized off the shelf machines that can be as low p
Distributed storage may help to solve both availability and scalability problems on these machines.
Many solutions were proposed, they can be categorized as block storage, file storage and object storage depending on the abstraction they provide.
## Related work
## Overview
Block storage is the most low level one, it's like exposing your raw hard drive over the network.
It requires very low latencies and stable network, that are often dedicated.
@ -36,3 +38,19 @@ However Pithos is not maintained anymore. More precisely the company that publis
Some tests conducted by the [ACIDES project](https://acides.org/) have shown that Openstack Swift consumes way more resources (CPU+RAM) that we can afford. Furthermore, people developing Swift have not designed their software for geo-distribution.
There were many attempts in research too. I am only thinking to [LBFS](https://pdos.csail.mit.edu/papers/lbfs:sosp01/lbfs.pdf) that was used as a basis for Seafile. But none of them have been effectively implemented yet.
## Existing software
**[Pithos](https://github.com/exoscale/pithos) :**
Pithos has been abandonned and should probably not used yet, in the following we explain why we did not pick their design.
Pithos was relying as a S3 proxy in front of Cassandra (and was working with Scylla DB too).
From its designers' mouth, storing data in Cassandra has shown its limitations justifying the project abandonment.
They built a closed-source version 2 that does not store blobs in the database (only metadata) but did not communicate further on it.
We considered there v2's design but concluded that it does not fit both our *Self-contained & lightweight* and *Simple* properties. It makes the development, the deployment and the operations more complicated while reducing the flexibility.
**[IPFS](https://ipfs.io/) :**
*Not written yet*
## Specific research papers
*Not yet written*