Commit Graph

204 Commits

Author SHA1 Message Date
Jonathan Davies
cb07e6145c Changed all instances of assignation to assignment. 2023-01-05 11:09:25 +00:00
Alex Auvolat
570e5e5bbb
Merge branch 'main' into next 2023-01-04 11:34:43 +01:00
Alex Auvolat
1fc220886a
Fix Consul & Kubernetes discovery with new way of doing background things 2023-01-03 16:55:59 +01:00
Alex Auvolat
8d5505514f
Make it explicit when using nonversioned encoding 2023-01-03 15:27:36 +01:00
Alex Auvolat
cdb2a591e9
Refactor how things are migrated 2023-01-03 14:44:47 +01:00
Alex Auvolat
939a6d67e8
Merge branch 'main' into internals-rework 2023-01-02 15:07:44 +01:00
Alex Auvolat
6775569525
Bump everything to v0.8.1 2023-01-02 14:15:33 +01:00
Alex Auvolat
e6f14ab5cf
better error message handling 2022-12-14 16:11:19 +01:00
Alex Auvolat
510b620108
Get rid of background::spawn 2022-12-14 16:08:05 +01:00
Alex Auvolat
a19bfef508
Improve error message on rpc connection failure 2022-12-14 12:57:33 +01:00
Alex Auvolat
d56c472712
Refactor background runner and get rid of job worker 2022-12-14 12:51:42 +01:00
Alex
6e44369cbc Merge pull request 'Optimal layout assignation algorithm' (#296) from optimal-layout into next
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/296
2022-12-11 17:41:53 +00:00
Alex Auvolat
2c2e65ad8b
Merge commit 'ec12d6c' into next 2022-12-11 18:41:15 +01:00
Alex Auvolat
9d83364ad9
itertools .unique() doesn't require sorted items 2022-12-11 18:30:02 +01:00
Alex Auvolat
280d1be7b1
Refactor health check and add ability to return it in json 2022-12-05 15:28:57 +01:00
Alex Auvolat
54e800ef8d
Tentative fix for issue #414 2022-11-21 17:13:41 +01:00
Alex Auvolat
ec12d6c8dd
Slightly simplify code at places 2022-11-08 16:15:45 +01:00
Alex Auvolat
d75b37b018
Return more info when layout's .check() fails, fix compilation, fix test 2022-11-08 14:58:39 +01:00
Alex Auvolat
73a4ca8b15
Use bytes as capacity units 2022-11-07 21:12:11 +01:00
Alex Auvolat
fd5bc142b5
Ensure .sort() is called before counting unique items 2022-11-07 20:29:25 +01:00
Alex Auvolat
ea5afc2511
Style improvements 2022-11-07 20:11:30 +01:00
Alex Auvolat
28d7a49f63
Merge branch 'main' into optimal-layout 2022-11-07 12:20:59 +01:00
Alex Auvolat
57b5c2c754
Change reqwest rustls features 2022-10-18 22:11:27 +02:00
Alex Auvolat
8bc5caf7aa
Fix issue with 'http(s)://' prefix 2022-10-18 21:17:11 +02:00
Alex Auvolat
2da8786f54
move things around 2022-10-18 19:13:52 +02:00
Alex Auvolat
5d8d393054
Load TLS certificates only once 2022-10-18 19:11:16 +02:00
Alex Auvolat
002b9fc50c
Add TLS support for Consul discovery + refactoring 2022-10-18 18:38:20 +02:00
Alex Auvolat
fcaee3bea0
definitively expunge openssl from dependencies everywhere 2022-10-14 18:10:36 +02:00
Mendes
bcdd1e0c33 Added some comment 2022-10-11 18:29:21 +02:00
Mendes
e5664c9822 Improved the statistics displayed in layout show
corrected a few bugs
2022-10-11 17:17:13 +02:00
Mendes
4abab246f1 cargo fmt 2022-10-10 17:21:13 +02:00
Mendes
fcf9ac674a Tests written in layout.rs
added staged_parameters to ClusterLayout
removed the serde(default) -> will need a migration function
2022-10-10 17:19:25 +02:00
Mendes
911eb17bd9 corrected warnings of cargo clippy 2022-10-06 14:53:57 +02:00
Mendes
9407df60cc Corrected two bugs:
- self.node_id_vec was not properly updated when the previous ring was empty
- ClusterLayout::merge was not considering changes in the layout parameters
2022-10-06 12:54:51 +02:00
Mendes
ceac3713d6 modifications in several files to :
- have consistent error return types
- store the zone redundancy in a Lww
- print the error and message in the CLI (TODO: for the server Api, should msg be returned in the body response?)
2022-10-05 15:29:48 +02:00
Mendes
829f815a89 Merge remote-tracking branch 'origin/main' into optimal-layout 2022-10-04 18:14:49 +02:00
Mendes
99f96b9564 deleted zone_redundancy from System struct 2022-10-04 18:09:24 +02:00
Alex Auvolat
ad917ffd3f
Fix instant substractions that might have panicked 2022-09-29 15:53:54 +02:00
Mendes
bd842e1388 Correction of a few bugs in the tests, modification of ClusterLayout::check 2022-09-22 19:30:01 +02:00
Mendes
7f3249a237 New version of the algorithm that calculate the layout.
It takes as paramters the replication factor and the zone redundancy, computes the
largest partition size reachable with these constraints, and among the possible
assignation with this partition size, it computes the one that moves the least number
of partitions compared to the previous assignation.
This computation uses graph algorithms defined in graph_algo.rs
2022-09-21 14:39:59 +02:00
Alex Auvolat
ded444f6c9
Ability to have custom timeouts in request strategy (not used) 2022-09-20 16:01:41 +02:00
Alex Auvolat
56592e1853
RPC performance changes
- configurable ping timeout
- single, much higher, configurable RPC timeout
- no more concurrency semaphore
2022-09-19 20:31:00 +02:00
Alex Auvolat
e46dc2a8ef
Allow for hostnames in bootstrap_peers and rpc_public_addr (fix #353) 2022-09-14 16:09:38 +02:00
Alex Auvolat
ab722cb40f
Add checks on replication_factor of layouts we use (fix #363, fix #364) 2022-09-13 16:22:23 +02:00
Alex Auvolat
44733474bb
Remove/change println! in server code (fix #358) 2022-09-13 16:01:55 +02:00
Alex Auvolat
28a4af73ca
Use netapp 0.5 published from crates.io 2022-09-13 13:11:44 +02:00
Alex Auvolat
7f54706b95
Merge branch 'lx-perf-improvements' into netapp-stream-body 2022-09-08 15:50:56 +02:00
Alex Auvolat
d9d199a6c9
Merge branch 'main' into lx-perf-improvements 2022-09-08 15:49:17 +02:00
Alex Auvolat
db61f41030
Move GIT_VERSION injection later in build chain to reduce build times 2022-09-07 11:59:56 +02:00
Alex Auvolat
6b958979bd
Merge branch 'lx-perf-improvements' into netapp-stream-body 2022-09-06 22:13:01 +02:00
Alex Auvolat
48ffaaadfc
Bump versions to 0.8.0 (compatibility is broken already) 2022-09-06 16:47:56 +02:00
Jakub Jirutka
a6e40b75ea Add feature "system-libs" to enable linking against system libraries
If this feature is enabled, libsodium-sys and zstd-sys will link
dynamically against system-provided libraries instead of building
and linking statically the bundled (possibly outdated and vulnerable)
copies of them. This feature is intended mainly for linux package
maintainers.
2022-09-03 18:44:34 +02:00
Alex Auvolat
6226f5ceca
Update to netapp 0.4.5 - fixed ping 2022-09-02 14:33:12 +02:00
Alex Auvolat
1ef87ac4cb
cargo fmt 2022-09-02 13:38:29 +02:00
Alex Auvolat
99b532b85b
Apply PRIO_SECONDARY to block data transfers 2022-09-01 16:35:43 +02:00
Alex Auvolat
df094bd807
Less strict timeouts 2022-09-01 16:30:44 +02:00
Alex Auvolat
bc977f9a7a
Update to Netapp with OrderTag support and exploit OrderTags 2022-09-01 12:58:20 +02:00
Alex Auvolat
322dafc761
Try to fix clippy 2022-08-29 17:32:45 +02:00
Alex Auvolat
1921f4f7e6
Merge branch 'lx-perf-improvements' into netapp-stream-body 2022-08-29 16:45:05 +02:00
Quentin Dufour
2c7bae935a
Configure structopt to report the right version
By default, structopt reports the value provided by
the env var CARGO_PKG_VERSION, feeded by Cargo when reading
Cargo.toml. However for Garage we use a versioning based on git,
so we often report a version that is behind the real version.
In this commit, we create garage_util::version::garage() that
reports the right version and configure all structopt subcommands
to call this function instead of using the env var.
2022-08-11 10:21:45 +02:00
Alex Auvolat
e935861854
Factor out node request order selection logic & use in manager 2022-07-29 12:25:03 +02:00
Alex Auvolat
605a630333
Use streaming in block manager 2022-07-29 12:25:02 +02:00
Alex Auvolat
a35d4da721
update netapp to 0.5 2022-07-29 12:25:02 +02:00
Alex Auvolat
8e7e680afe
First adaptation to WIP netapp with streaming body 2022-07-29 12:25:02 +02:00
Alex
4f38cadf6e Background task manager (#332)
- [x] New background worker trait
- [x] Adapt all current workers to use new API
- [x] Command to list currently running workers, and whether they are active, idle, or dead
- [x] Error reporting
- Optimizations
  - [x] Merkle updater: several items per iteration
  - [ ] Use `tokio::task::spawn_blocking` where appropriate so that CPU-intensive tasks don't block other things going on
- scrub:
  - [x] have only one worker with a channel to start/pause/cancel
  - [x] automatic scrub
  - [x] ability to view and change tranquility from CLI
  - [x] persistence of a few info
- [ ] Testing

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/332
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-07-08 13:30:26 +02:00
Alex
382e74c798 First version of admin API (#298)
**Spec:**

- [x] Start writing
- [x] Specify all layout endpoints
- [x] Specify all endpoints for operations on keys
- [x] Specify all endpoints for operations on key/bucket permissions
- [x] Specify all endpoints for operations on buckets
- [x] Specify all endpoints for operations on bucket aliases

View rendered spec at <https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/admin-api/doc/drafts/admin-api.md>

**Code:**

- [x] Refactor code for admin api to use common api code that was created for K2V

**General endpoints:**

- [x] Metrics
- [x] GetClusterStatus
- [x] ConnectClusterNodes
- [x] GetClusterLayout
- [x] UpdateClusterLayout
- [x] ApplyClusterLayout
- [x] RevertClusterLayout

**Key-related endpoints:**

- [x] ListKeys
- [x] CreateKey
- [x] ImportKey
- [x] GetKeyInfo
- [x] UpdateKey
- [x] DeleteKey

**Bucket-related endpoints:**

- [x] ListBuckets
- [x] CreateBucket
- [x] GetBucketInfo
- [x] DeleteBucket
- [x] PutBucketWebsite
- [x] DeleteBucketWebsite

**Operations on key/bucket permissions:**

- [x] BucketAllowKey
- [x] BucketDenyKey

**Operations on bucket aliases:**

- [x] GlobalAliasBucket
- [x] GlobalUnaliasBucket
- [x] LocalAliasBucket
- [x] LocalUnaliasBucket

**And also:**

- [x] Separate error type for the admin API (this PR includes a quite big refactoring of error handling)
- [x] Add management of website access
- [ ] Check that nothing is missing wrt what can be done using the CLI
- [ ] Improve formatting of the spec
- [x] Make sure everyone is cool with the API design

Fix #231
Fix #295

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/298
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-24 12:16:39 +02:00
Alex
5768bf3622 First implementation of K2V (#293)
**Specification:**

View spec at [this URL](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md)

- [x] Specify the structure of K2V triples
- [x] Specify the DVVS format used for causality detection
- [x] Specify the K2V index (just a counter of number of values per partition key)
- [x] Specify single-item endpoints: ReadItem, InsertItem, DeleteItem
- [x] Specify index endpoint: ReadIndex
- [x] Specify multi-item endpoints: InsertBatch, ReadBatch, DeleteBatch
- [x] Move to JSON objects instead of tuples
- [x] Specify endpoints for polling for updates on single values (PollItem)

**Implementation:**

- [x] Table for K2V items, causal contexts
- [x] Indexing mechanism and table for K2V index
- [x] Make API handlers a bit more generic
- [x] K2V API endpoint
- [x] K2V API router
- [x] ReadItem
- [x] InsertItem
- [x] DeleteItem
- [x] PollItem
- [x] ReadIndex
- [x] InsertBatch
- [x] ReadBatch
- [x] DeleteBatch

**Testing:**

- [x] Just a simple Python script that does some requests to check visually that things are going right (does not contain parsing of results or assertions on returned values)
- [x] Actual tests:
  - [x] Adapt testing framework
  - [x] Simple test with InsertItem + ReadItem
  - [x] Test with several Insert/Read/DeleteItem + ReadIndex
  - [x] Test all combinations of return formats for ReadItem
  - [x] Test with ReadBatch, InsertBatch, DeleteBatch
  - [x] Test with PollItem
  - [x] Test error codes
- [ ] Fix most broken stuff
  - [x] test PollItem broken randomly
  - [x] when invalid causality tokens are given, errors should be 4xx not 5xx

**Improvements:**

- [x] Descending range queries
  - [x] Specify
  - [x] Implement
  - [x] Add test
- [x] Batch updates to index counter
- [x] Put K2V behind `k2v` feature flag

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/293
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-10 13:16:57 +02:00
Alex Auvolat
def78c5e6f
Update netapp to 0.4.4, fix #300 2022-05-09 12:08:47 +02:00
Alex Auvolat
617f28bfa4
Correct small formatting issue 2022-05-05 14:21:57 +02:00
Mendes
948ff93cf1 Corrected the warnings and errors issued by cargo clippy 2022-05-01 16:05:39 +02:00
Alex Auvolat
2aeaddd5e2
Apply cargo fmt 2022-05-01 09:57:05 +02:00
Alex Auvolat
c1d1646c4d
Change the way new layout assignations are computed.
The function now computes an optimal assignation (with respect to partition size) that minimizes the distance to the former assignation, using flow algorithms.

This commit was written by Mendes Oulamara <mendes.oulamara@pm.me>
2022-05-01 09:54:19 +02:00
Alex Auvolat
94f1e48fff Update to netapp 0.4.2 (a tiny fix) 2022-04-07 11:50:03 +02:00
Alex Auvolat
9d0ed78887 Add feature flag for Kubernetes discovery 2022-03-24 16:57:43 +01:00
Alex Auvolat
509d256c58
Make layout optimization work in relative terms 2022-03-24 15:27:14 +01:00
Alex Auvolat
7e0e2ffda2
Slight change and add comment to layout assignation algo 2022-03-24 15:27:13 +01:00
Alex Auvolat
413ab0eaed
Small change to partition assignation algorithm
This change helps ensure that nodes for each partition are spread
over all datacenters, a property that wasn't ensured previously
when going from a 2 DC deployment to a 3 DC deployment
2022-03-24 15:27:10 +01:00
Alex Auvolat
db46cdef79
Update netapp to v0.4.1 2022-03-15 17:09:57 +01:00
Alex Auvolat
ba6b56ae68
Fix some new clippy lints 2022-03-14 12:27:49 +01:00
Alex Auvolat
2377a92f6b
Add wrapper over sled tree to count items (used for big queues) 2022-03-14 10:54:25 +01:00
Alex Auvolat
203e8d2c34
Bump version to 0.7 because of incompatible Netapp 2022-03-14 10:54:24 +01:00
Alex Auvolat
f869ca625d
Add spans to table calls, change span names in RPC 2022-03-14 10:54:12 +01:00
Alex Auvolat
0cc31ee169
add missing netapp telemetry feature 2022-03-14 10:54:11 +01:00
Alex Auvolat
dc8d0496cc
Refactoring: rename config files, make modifications less invasive 2022-03-14 10:53:51 +01:00
Alex Auvolat
2a5609b292
Add metrics to API endpoint 2022-03-14 10:53:36 +01:00
Alex Auvolat
818daa5c78
Refactor how durations are measured 2022-03-14 10:53:35 +01:00
Alex Auvolat
bb04d94fa9
Update to Netapp 0.4 which supports distributed tracing 2022-03-14 10:52:30 +01:00
Alex Auvolat
8c2fb0c066
Add tracing integration with opentelemetry 2022-03-14 10:52:13 +01:00
Alex Auvolat
2cab84b1fe
Add many metrics in table/ and rpc/ 2022-03-14 10:51:50 +01:00
Max Audron
9d44127245
add support for kubernetes service discovery
This commit adds support to discover garage instances running in
kubernetes.

Once enabled by setting `kubernetes_namespace` and
`kubernetes_service_name` garage will create a Custom Resources
`garagenodes.deuxfleurs.fr` with nodes public key as the resource name.
and IP and Port information as spec in the namespace configured by
`kubernetes_namespace`.

For discovering nodes the resources are filtered with the optionally set
`kubernetes_service_name` which sets a label
`garage.deuxfleurs.fr/service` on the resources.

This allows to separate multiple garage deployments in a single
namespace.

the `kubernetes_skip_crd` variable allows to disable the creation of the
CRD by garage itself. The user must deploy this manually.
2022-03-12 13:05:52 +01:00
Alex Auvolat
beeef4758e
Some movement of helper code and refactoring of error handling 2022-01-04 12:52:46 +01:00
Alex Auvolat
5b1117e582
New model for buckets 2022-01-04 12:45:46 +01:00
Alex Auvolat
c94406f428
Improve how node roles are assigned in Garage
- change the terminology: the network configuration becomes the role
  table, the configuration of a nodes becomes a node's role
- the modification of the role table takes place in two steps: first,
  changes are staged in a CRDT data structure. Then, once the user is
  happy with the changes, they can commit them all at once (or revert
  them).
- update documentation
- fix tests
- implement smarter partition assignation algorithm

This patch breaks the format of the network configuration: when
migrating, the cluster will be in a state where no roles are assigned.
All roles must be re-assigned and commited at once. This migration
should not pose an issue.
2021-11-16 16:05:53 +01:00
Alex Auvolat
e8811f7c9d
Request strategy: don't launch all 3 requests if not needed 2021-11-04 16:19:27 +01:00
Alex Auvolat
6f13d083ab
Add semaphore to limit RAM used by buffered outgoing requests 2021-11-03 18:02:57 +01:00
Alex Auvolat
8c4f418fe8
Fix peer list persistence: do not forget previous peers 2021-11-03 17:34:44 +01:00
Alex Auvolat
43e13a501d
Use published netapp crate instead of git repo 2021-10-26 10:36:57 +02:00
Alex Auvolat
ada7899b24
Fix clippy lints (fix #121) 2021-10-26 10:20:05 +02:00
Alex Auvolat
de4276202a
Improve CLI, adapt tests, update documentation 2021-10-25 14:21:48 +02:00
Alex Auvolat
1b450c4b49
Improvements to CLI and various fixes for netapp version
Discovery via consul, persist peer list to file
2021-10-22 16:55:24 +02:00