Commit Graph

205 Commits

Author SHA1 Message Date
Alex Auvolat
2183518edc
Spawn all background workers in a separate step 2022-12-14 12:28:07 +01:00
Alex Auvolat
83c8467e23
Proper queueing for delayed inserts, now backed to disk 2022-12-14 11:58:06 +01:00
Alex Auvolat
f8e528c15d
Small refactor of tables internals 2022-12-14 10:48:49 +01:00
Alex Auvolat
de9d6cddf7
Prettier worker list table; remove useless CLI log messages 2022-12-12 17:17:05 +01:00
Alex Auvolat
280d1be7b1
Refactor health check and add ability to return it in json 2022-12-05 15:28:57 +01:00
Alex Auvolat
2065f011ca
Implement /health admin API endpoint to check node health 2022-12-05 14:59:15 +01:00
Alex Auvolat
c1fb65194c
Add sled default in garage_model also 2022-11-21 14:25:54 +01:00
Alex Auvolat
5b18fd8201
Add garage bucket cleanup-incomplete-uploads command 2022-11-04 11:55:59 +01:00
Alex Auvolat
ded444f6c9
Ability to have custom timeouts in request strategy (not used) 2022-09-20 16:01:41 +02:00
Alex Auvolat
56592e1853
RPC performance changes
- configurable ping timeout
- single, much higher, configurable RPC timeout
- no more concurrency semaphore
2022-09-19 20:31:00 +02:00
Alex Auvolat
ab722cb40f
Add checks on replication_factor of layouts we use (fix #363, fix #364) 2022-09-13 16:22:23 +02:00
Alex Auvolat
38be811b1c
Fix clippy lint that says we should implement Eq 2022-09-13 16:08:00 +02:00
Alex Auvolat
07febd3ecd
Ensure data dir is created immediately when Garage starts (fix #349) 2022-09-13 15:57:27 +02:00
Alex Auvolat
28a4af73ca
Use netapp 0.5 published from crates.io 2022-09-13 13:11:44 +02:00
Alex Auvolat
7f54706b95
Merge branch 'lx-perf-improvements' into netapp-stream-body 2022-09-08 15:50:56 +02:00
Alex Auvolat
d9d199a6c9
Merge branch 'main' into lx-perf-improvements 2022-09-08 15:49:17 +02:00
Alex Auvolat
ceb1f0229a
Move version back into util 2022-09-07 18:36:46 +02:00
Alex Auvolat
f310fce34b
Inject GIT_VERSION even later 2022-09-07 18:30:15 +02:00
Alex Auvolat
8adc654713
Merge branch 'main' into improve-deps 2022-09-07 18:13:27 +02:00
Alex Auvolat
28d86e7602
Report build features in garage --help 2022-09-07 17:05:21 +02:00
Alex Auvolat
db61f41030
Move GIT_VERSION injection later in build chain to reduce build times 2022-09-07 11:59:56 +02:00
Alex Auvolat
6b958979bd
Merge branch 'lx-perf-improvements' into netapp-stream-body 2022-09-06 22:13:01 +02:00
Alex Auvolat
6f02c36a89
cargo fmt 2022-09-06 17:59:41 +02:00
Alex Auvolat
0f5689c169
Include code from v0.5.1 directly to remove dependencies 2022-09-06 17:52:50 +02:00
Alex Auvolat
b886c75450
Make all DB engines optional build features 2022-09-06 17:09:43 +02:00
Alex Auvolat
48ffaaadfc
Bump versions to 0.8.0 (compatibility is broken already) 2022-09-06 16:47:56 +02:00
Alex Auvolat
07e6bcde85
Merge branch 'main' into lx-perf-improvements 2022-09-05 12:40:17 +02:00
Alex Auvolat
943d76c583
Ability to dynamically set resync tranquility 2022-09-02 15:34:21 +02:00
Alex Auvolat
a35d4da721
update netapp to 0.5 2022-07-29 12:25:02 +02:00
Alex Auvolat
8e7e680afe
First adaptation to WIP netapp with streaming body 2022-07-29 12:25:02 +02:00
Alex Auvolat
2f111e6b3d
Performance improvements:
- reduce contention on mutation_lock by having 256 of them
- better lmdb defaults
2022-07-29 12:24:48 +02:00
Alex
4f38cadf6e Background task manager (#332)
- [x] New background worker trait
- [x] Adapt all current workers to use new API
- [x] Command to list currently running workers, and whether they are active, idle, or dead
- [x] Error reporting
- Optimizations
  - [x] Merkle updater: several items per iteration
  - [ ] Use `tokio::task::spawn_blocking` where appropriate so that CPU-intensive tasks don't block other things going on
- scrub:
  - [x] have only one worker with a channel to start/pause/cancel
  - [x] automatic scrub
  - [x] ability to view and change tranquility from CLI
  - [x] persistence of a few info
- [ ] Testing

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/332
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-07-08 13:30:26 +02:00
Alex
77e3fd6db2 improve internal item counter mechanisms and implement bucket quotas (#326)
- [x] Refactoring of internal counting API
- [x] Repair procedure for counters (it's an offline procedure!!!)
- [x] New counter for objects in buckets
- [x] Add quotas to buckets struct
- [x] Add CLI to manage bucket quotas
- [x] Add admin API to manage bucket quotas
- [x] Apply quotas by adding checks on put operations
- [x] Proof-read

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/326
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-06-15 20:20:28 +02:00
Alex
b44d3fc796 Abstract database behind generic interface and implement alternative drivers (#322)
- [x] Design interface
- [x] Implement Sled backend
  - [x] Re-implement the SledCountedTree hack ~~on Sled backend~~ on all backends (i.e. over the abstraction)
- [x] Convert Garage code to use generic interface
- [x] Proof-read converted Garage code
- [ ] Test everything well
- [x] Implement sqlite backend
- [x] Implement LMDB backend
- [ ] (Implement Persy backend?)
- [ ] (Implement other backends? (like RocksDB, ...))
- [x] Implement backend choice in config file and garage server module
- [x] Add CLI for converting between DB formats
- Exploit the new interface to put more things in transactions
  - [x] `.updated()` trigger on Garage tables

Fix #284

**Bugs**

- [x] When exporting sqlite, trees iterate empty??
- [x] LMDB doesn't work

**Known issues for various back-ends**

- Sled:
  - Eats all my RAM and also all my disk space
  - `.len()` has to traverse the whole table
  - Is actually quite slow on some operations
  - And is actually pretty bad code...
- Sqlite:
  - Requires a lock to be taken on all operations. The lock is also taken when iterating on a table with `.iter()`, and the lock isn't released until the iterator is dropped. This means that we must be VERY carefull to not do anything else inside a `.iter()` loop or else we will have a deadlock! Most such cases have been eliminated from the Garage codebase, but there might still be some that remain. If your Garage-over-Sqlite seems to hang/freeze, this is the reason.
  - (adapter uses a bunch of unsafe code)
- Heed (LMDB):
  - Not suited for 32-bit machines as it has to map the whole DB in memory.
  - (adpater uses a tiny bit of unsafe code)

**My recommendation:** avoid 32-bit machines and use LMDB as much as possible.

**Converting databases** is actually quite easy. For example from Sled to LMDB:

```bash
cd src/db
cargo run --features cli --bin convert -- -i path/to/garage/meta/db -a sled -o path/to/garage/meta/db.lmdb -b lmdb
```

Then, just add this to your `config.toml`:

```toml
db_engine = "lmdb"
```

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/322
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-06-08 10:01:44 +02:00
Alex
382e74c798 First version of admin API (#298)
**Spec:**

- [x] Start writing
- [x] Specify all layout endpoints
- [x] Specify all endpoints for operations on keys
- [x] Specify all endpoints for operations on key/bucket permissions
- [x] Specify all endpoints for operations on buckets
- [x] Specify all endpoints for operations on bucket aliases

View rendered spec at <https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/admin-api/doc/drafts/admin-api.md>

**Code:**

- [x] Refactor code for admin api to use common api code that was created for K2V

**General endpoints:**

- [x] Metrics
- [x] GetClusterStatus
- [x] ConnectClusterNodes
- [x] GetClusterLayout
- [x] UpdateClusterLayout
- [x] ApplyClusterLayout
- [x] RevertClusterLayout

**Key-related endpoints:**

- [x] ListKeys
- [x] CreateKey
- [x] ImportKey
- [x] GetKeyInfo
- [x] UpdateKey
- [x] DeleteKey

**Bucket-related endpoints:**

- [x] ListBuckets
- [x] CreateBucket
- [x] GetBucketInfo
- [x] DeleteBucket
- [x] PutBucketWebsite
- [x] DeleteBucketWebsite

**Operations on key/bucket permissions:**

- [x] BucketAllowKey
- [x] BucketDenyKey

**Operations on bucket aliases:**

- [x] GlobalAliasBucket
- [x] GlobalUnaliasBucket
- [x] LocalAliasBucket
- [x] LocalUnaliasBucket

**And also:**

- [x] Separate error type for the admin API (this PR includes a quite big refactoring of error handling)
- [x] Add management of website access
- [ ] Check that nothing is missing wrt what can be done using the CLI
- [ ] Improve formatting of the spec
- [x] Make sure everyone is cool with the API design

Fix #231
Fix #295

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/298
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-24 12:16:39 +02:00
Alex
5768bf3622 First implementation of K2V (#293)
**Specification:**

View spec at [this URL](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md)

- [x] Specify the structure of K2V triples
- [x] Specify the DVVS format used for causality detection
- [x] Specify the K2V index (just a counter of number of values per partition key)
- [x] Specify single-item endpoints: ReadItem, InsertItem, DeleteItem
- [x] Specify index endpoint: ReadIndex
- [x] Specify multi-item endpoints: InsertBatch, ReadBatch, DeleteBatch
- [x] Move to JSON objects instead of tuples
- [x] Specify endpoints for polling for updates on single values (PollItem)

**Implementation:**

- [x] Table for K2V items, causal contexts
- [x] Indexing mechanism and table for K2V index
- [x] Make API handlers a bit more generic
- [x] K2V API endpoint
- [x] K2V API router
- [x] ReadItem
- [x] InsertItem
- [x] DeleteItem
- [x] PollItem
- [x] ReadIndex
- [x] InsertBatch
- [x] ReadBatch
- [x] DeleteBatch

**Testing:**

- [x] Just a simple Python script that does some requests to check visually that things are going right (does not contain parsing of results or assertions on returned values)
- [x] Actual tests:
  - [x] Adapt testing framework
  - [x] Simple test with InsertItem + ReadItem
  - [x] Test with several Insert/Read/DeleteItem + ReadIndex
  - [x] Test all combinations of return formats for ReadItem
  - [x] Test with ReadBatch, InsertBatch, DeleteBatch
  - [x] Test with PollItem
  - [x] Test error codes
- [ ] Fix most broken stuff
  - [x] test PollItem broken randomly
  - [x] when invalid causality tokens are given, errors should be 4xx not 5xx

**Improvements:**

- [x] Descending range queries
  - [x] Specify
  - [x] Implement
  - [x] Add test
- [x] Batch updates to index counter
- [x] Put K2V behind `k2v` feature flag

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/293
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-10 13:16:57 +02:00
Alex Auvolat
94f1e48fff Update to netapp 0.4.2 (a tiny fix) 2022-04-07 11:50:03 +02:00
Alex Auvolat
077dd1cde9
Clippy 2022-03-23 10:25:39 +01:00
Alex Auvolat
e480aaf338
Make background tranquility a configurable parameter 2022-03-23 10:25:19 +01:00
Alex Auvolat
c3982a90b6
Move DataBlock out of manager.rs 2022-03-23 10:25:19 +01:00
Alex Auvolat
c1d9854d2c
Move block manager to separate module 2022-03-23 10:25:15 +01:00
Alex Auvolat
db46cdef79
Update netapp to v0.4.1 2022-03-15 17:09:57 +01:00
Alex Auvolat
ba6b56ae68
Fix some new clippy lints 2022-03-14 12:27:49 +01:00
Alex Auvolat
0af314b295
Add comment for fsync 2022-03-14 11:54:00 +01:00
Alex Auvolat
d78bf379fb
Fix resync queue to not drop items 2022-03-14 11:51:37 +01:00
Alex Auvolat
f7e6f4616f
Spawn a single resync worker 2022-03-14 11:51:37 +01:00
Alex Auvolat
dc5ec4ecf9
Add appropriate fsync() calls in write_block
to ensure that data is persisted properly
2022-03-14 11:51:32 +01:00
Alex Auvolat
fe62d01b7e
Implement exponential backoff for resync retries 2022-03-14 11:41:20 +01:00
Alex Auvolat
2377a92f6b
Add wrapper over sled tree to count items (used for big queues) 2022-03-14 10:54:25 +01:00
Alex Auvolat
203e8d2c34
Bump version to 0.7 because of incompatible Netapp 2022-03-14 10:54:24 +01:00
Alex Auvolat
2a5609b292
Add metrics to API endpoint 2022-03-14 10:53:36 +01:00
Alex Auvolat
818daa5c78
Refactor how durations are measured 2022-03-14 10:53:35 +01:00
Alex Auvolat
bb04d94fa9
Update to Netapp 0.4 which supports distributed tracing 2022-03-14 10:52:30 +01:00
Alex Auvolat
8c2fb0c066
Add tracing integration with opentelemetry 2022-03-14 10:52:13 +01:00
Alex Auvolat
2cab84b1fe
Add many metrics in table/ and rpc/ 2022-03-14 10:51:50 +01:00
Alex Auvolat
ea7fb901eb
Implement {Put,Get,Delete}BucketCors and CORS in general
- OPTIONS request against API endpoint
- Returning corresponding CORS headers on API calls
- Returning corresponding CORS headers on website GET's
2022-01-24 11:58:00 +01:00
trinity-1686a
e55fa38c99 Add date verification to presigned urls (#196)
fix #96
fix #162 by returning Forbidden instead Bad Request

Co-authored-by: Trinity Pointard <trinity.pointard@gmail.com>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/196
Co-authored-by: trinity-1686a <trinity.pointard@gmail.com>
Co-committed-by: trinity-1686a <trinity.pointard@gmail.com>
2022-01-18 12:22:31 +01:00
Alex Auvolat
6617a72220
Implement UploadPartCopy 2022-01-13 13:58:47 +01:00
Quentin
b4592a00fe Implement ListMultipartUploads (#171)
Implement ListMultipartUploads, also refactor ListObjects and ListObjectsV2.

It took me some times as I wanted to propose the following things:
  - Using an iterator instead of the loop+goto pattern. I find it easier to read and it should enable some optimizations. For example, when consuming keys of a common prefix, we do many [redundant checks](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/api/s3_list.rs#L125-L156) while the only thing to do is to [check if the following key is still part of the common prefix](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/feature/s3-multipart-compat/src/api/s3_list.rs#L476).
  - Try to name things (see ExtractionResult and RangeBegin enums) and to separate concerns (see ListQuery and Accumulator)
  - An IO closure to make unit tests possibles.
  - Unit tests, to track regressions and document how to interact with the code
  - Integration tests with `s3api`. In the future, I would like to move them in Rust with the aws rust SDK.

Merging of the logic of ListMultipartUploads and ListObjects was not a goal but a consequence of the previous modifications.

Some points that we might want to discuss:
  - ListObjectsV1, when using pagination and delimiters, has a weird behavior (it lists multiple times the same prefix) with `aws s3api` due to the fact that it can not use our optimization to skip the whole prefix. It is independant from my refactor and can be tested with the commented `s3api` tests in `test-smoke.sh`. It probably has the same weird behavior on the official AWS S3 implementation.
  - Considering ListMultipartUploads, I had to "abuse" upload id marker to support prefix skipping. I send an `upload-id-marker` with the hardcoded value `include` to emulate your "including" token.
  - Some ways to test ListMultipartUploads with existing software (my tests are limited to s3api for now).

Co-authored-by: Quentin Dufour <quentin@deuxfleurs.fr>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/171
Co-authored-by: Quentin <quentin@dufour.io>
Co-committed-by: Quentin <quentin@dufour.io>
2022-01-12 19:04:55 +01:00
Alex Auvolat
8395030e48
Implement CreateBucket 2022-01-05 15:56:48 +01:00
Alex Auvolat
677ab60cc1
Small changes in key model and refactoring 2022-01-04 18:59:17 +01:00
Alex Auvolat
df35feba18
New buckets for 0.6.0: make bucket id a SK and not a HK, CLI updates 2022-01-04 12:53:14 +01:00
Alex Auvolat
1bcd6fabbd
New buckets for 0.6.0: small changes
- Fix bucket delete

- fix merge of bucket creation date

- Replace deletable with option in aliases
    Rationale: if two aliases point to conflicting bucket, resolving
    by making an arbitrary choice risks making data accessible when it
    shouldn't be. We'd rather resolve to deleting the alias until
    someone puts it back.
2022-01-04 12:52:47 +01:00
Alex Auvolat
ba7f268b99
Rename and change query filters 2022-01-04 12:52:46 +01:00
Alex Auvolat
e59c23a69d
Refactor logic for setting/unsetting aliases 2022-01-04 12:52:46 +01:00
Alex Auvolat
2140cd7205
Remove website redirects 2022-01-04 12:52:46 +01:00
Alex Auvolat
beeef4758e
Some movement of helper code and refactoring of error handling 2022-01-04 12:52:46 +01:00
Alex Auvolat
d8ab5bdc3e
New buckets for 0.6.0: fix model and migration 2022-01-04 12:47:28 +01:00
Alex Auvolat
b1cfd16913
New buckets for 0.6.0: small fixes, including:
- ensure bucket names are correct aws s3 names
- when making aliases, ensure timestamps of links in both ways are the
  same
- fix small remarks by trinity
- don't have a separate website_access field
2022-01-04 12:46:41 +01:00
Alex Auvolat
4d30e62db4
New buckets for 0.6.0: migration code and build files 2022-01-04 12:46:13 +01:00
Alex Auvolat
0bbb6673e7
Model changes 2022-01-04 12:45:52 +01:00
Alex Auvolat
53f71b3a57
Implement bucket alias and bucket unalias 2022-01-04 12:45:51 +01:00
Alex Auvolat
5b1117e582
New model for buckets 2022-01-04 12:45:46 +01:00
Alex Auvolat
8f6026de5e
Make table name a const in trait 2021-12-15 15:39:10 +01:00
trinity-1686a
1eb972b1ac Add compression using zstd (#173)
fix #27

Co-authored-by: Trinity Pointard <trinity.pointard@gmail.com>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/173
Co-authored-by: trinity-1686a <trinity.pointard@gmail.com>
Co-committed-by: trinity-1686a <trinity.pointard@gmail.com>
2021-12-15 11:26:43 +01:00
Alex Auvolat
c94406f428
Improve how node roles are assigned in Garage
- change the terminology: the network configuration becomes the role
  table, the configuration of a nodes becomes a node's role
- the modification of the role table takes place in two steps: first,
  changes are staged in a CRDT data structure. Then, once the user is
  happy with the changes, they can commit them all at once (or revert
  them).
- update documentation
- fix tests
- implement smarter partition assignation algorithm

This patch breaks the format of the network configuration: when
migrating, the cluster will be in a state where no roles are assigned.
All roles must be re-assigned and commited at once. This migration
should not pose an issue.
2021-11-16 16:05:53 +01:00
Alex Auvolat
ad7ab31411
Implement GC delay for table data 2021-11-08 15:47:47 +01:00
Alex Auvolat
74a7a550eb
Safety: never voluntarily delete block in 10min interval after RC reaches zero 2021-11-08 15:47:47 +01:00
Alex Auvolat
2090a6187f
Add tranquilizer mechanism to improve on token bucket mechanism 2021-11-04 13:26:59 +01:00
Alex Auvolat
69b89fb46d
Fix race in block resync 2021-10-27 12:01:12 +02:00
Alex Auvolat
6b47c294f5
Refactoring on repair commands 2021-10-27 11:14:55 +02:00
Trinity Pointard
28c015d9ff
add cli parameter to verify local bloc integrity
reuse code for listing local blocks
add disk i/o speed limit on integrity check
2021-10-27 10:31:03 +02:00
Alex Auvolat
43e13a501d
Use published netapp crate instead of git repo 2021-10-26 10:36:57 +02:00
Alex Auvolat
ada7899b24
Fix clippy lints (fix #121) 2021-10-26 10:20:05 +02:00
Alex Auvolat
df8a4068d9
Refactor block manager code, and hopefully fix deadlock 2021-10-25 14:21:51 +02:00
Alex Auvolat
de4276202a
Improve CLI, adapt tests, update documentation 2021-10-25 14:21:48 +02:00
Alex Auvolat
1b450c4b49
Improvements to CLI and various fixes for netapp version
Discovery via consul, persist peer list to file
2021-10-22 16:55:24 +02:00
Alex Auvolat
4067797d01
First port of Garage to Netapp 2021-10-22 15:55:18 +02:00
Alex Auvolat
b9127dd6f8
Prepare for v0.3.0 and add migration path from v0.2.1.x 2021-05-28 15:29:58 +02:00
Alex Auvolat
b490ebc7f6
Many improvements on ring/replication and its configuration:
- Explicit "replication_mode" configuration parameters that takes
  either "none", "2" or "3" as values, instead of letting user configure
  replication factor themselves. These are presets whose corresponding
  replication/quorum values can be found in replication/mode.rs

- Explicit support for single-node and two-node deployments
  (number of nodes must be at least "replication_mode", with "none"
  we can have only one node)

- Ring is now stored much more compactly with 256*8 + n*32 bytes,
  instead of 256*32 bytes

- Support for gateway-only nodes that do not store data
  (these nodes still need a metadata_directory to store the list
  of bucket and keys since those are stored on all nodes; it also
  technically needs a data_directory to start but it will stay
  empty unless we have bugs)
2021-05-28 14:07:36 +02:00
Trinity Pointard
e4b9e4e24d
rename types to CamelCase 2021-05-03 22:15:09 +02:00
Trinity Pointard
4a1e079e8f
fix clippy warnings on model 2021-05-03 22:11:41 +02:00
Alex Auvolat
119217f9f6
change a few comments 2021-04-27 16:53:47 +02:00
Trinity Pointard
2812a027ea
change some more comments and revert changes on TableSchema 2021-04-27 16:49:07 +02:00
Trinity Pointard
74373aebcf
make most requested changes 2021-04-27 16:47:08 +02:00
Trinity Pointard
1e3df189d0
document api crate 2021-04-27 16:37:10 +02:00
Trinity Pointard
67585a4ffa
attempt at documenting model crate 2021-04-27 16:37:10 +02:00
Trinity Pointard
b437610812
attempt at documenting table crate 2021-04-27 16:37:10 +02:00
Alex Auvolat
9ced9f78dc
Improve bootstraping: do it regularly; persist peer list 2021-04-27 16:37:08 +02:00
Alex Auvolat
f859d15062 update to v0.2.1 2021-03-19 13:39:18 +01:00