Compare commits

...

40 Commits
main ... jepsen

Author SHA1 Message Date
Alex Auvolat
92dd2bbe15 jepsen: nlnet task3a seems to fix things 2023-11-16 18:09:13 +01:00
Alex Auvolat
18e5811159
jepsen: add patch and use more complete names 2023-11-16 12:57:21 +01:00
Alex Auvolat
5b1f50be65 jepsen: testing 2023-10-25 14:43:24 +02:00
Alex Auvolat
9df7fa0bcd jepsen: use 7 nodes 2023-10-25 14:04:39 +02:00
Alex Auvolat
fd85010a40 jepsen: failures with set2 test in --scenario r 2023-10-25 12:13:27 +02:00
Alex Auvolat
cfbfa09d24 jepsen: fix set2 test omg finally this is so stupid 2023-10-25 11:50:16 +02:00
Alex Auvolat
db921cc05f jepsen: reconfigure nemesis + add db nemesis 2023-10-25 11:41:34 +02:00
Alex Auvolat
4fa2646a75 jepsen: got a failure with set1 2023-10-24 17:45:22 +02:00
Alex Auvolat
d7ab2c639e jepsen: fix nemesis to actually generate many operations 2023-10-24 16:39:50 +02:00
Alex Auvolat
d13bde5e26 jepsen: set1 and set2 don't fail anymore ?? 2023-10-24 15:44:05 +02:00
Alex Auvolat
d2c365767b jepsen: more testing 2023-10-24 11:39:45 +02:00
Alex Auvolat
fb6c9a1243 jepsen: update readme 2023-10-20 15:55:09 +02:00
Alex Auvolat
9030c1eef8 jepsen: code path for nemesis final generator 2023-10-20 15:53:46 +02:00
Alex Auvolat
654775308e jepsen: add cluster reconfiguration nemesis 2023-10-20 15:48:37 +02:00
Alex Auvolat
f5b0972781 jepsen: register crdt read-after-write is fixed with deleteobject patch 2023-10-20 15:00:10 +02:00
Alex Auvolat
d148b83d4f jepsen: reg2 failure seems to happen only with deleteobject 2023-10-20 13:36:48 +02:00
Alex Auvolat
4b93ce179a jepsen: errors in reg2 workload under investigation 2023-10-20 12:56:55 +02:00
Alex Auvolat
4ba18ce9cc jepsen: wip checker for register-like behavior 2023-10-20 12:13:11 +02:00
Alex Auvolat
ef662822c9 jepsen: fix the list-objects call (?) 2023-10-19 23:40:55 +02:00
Alex Auvolat
da8b170748 jepsen: investigating listobjects error 2023-10-19 16:45:24 +02:00
Alex Auvolat
74e50edddd jepsen: refactoring 2023-10-19 14:34:19 +02:00
Alex Auvolat
b3bf16ee27 make jepsen test more robust: handle errors and timeouts, fixed access key 2023-10-18 17:51:34 +02:00
Alex Auvolat
ddd3de7fce refactor jepsen code 2023-10-18 16:30:45 +02:00
Alex Auvolat
84d43501ce refactor jepsen setup logic 2023-10-18 15:34:12 +02:00
Alex Auvolat
012ade5d4b jepsen: update jepsen and fix garage key info 2023-10-18 14:06:32 +02:00
Alex Auvolat
ef5ca86dfc jepsen: update to garage 0.9.0 2023-10-18 14:01:18 +02:00
Alex Auvolat
9ec4cca334 reformatting 2023-10-18 12:03:12 +02:00
Alex Auvolat
18ee8efb5f Check read-after-write property for sets 2023-10-18 12:03:12 +02:00
Alex Auvolat
55eb4e87c4 set tests with independant tests together 2023-10-18 12:03:11 +02:00
Alex Auvolat
0bb1577ae1 two set workloads with different checkers 2023-10-18 12:03:11 +02:00
Alex Auvolat
6eb26be548 Add garage set test (this one works :p) 2023-10-18 12:03:11 +02:00
Alex Auvolat
eb86eaa6d2 refactor jepsen test 2023-10-18 12:03:11 +02:00
Alex Auvolat
80d7b7d858 remove useless files 2023-10-18 12:03:11 +02:00
Alex Auvolat
93a7132b4c the fix for increasing timestamps does not make things linearizable 2023-10-18 12:03:11 +02:00
Alex Auvolat
dc5245ce65 even without nemesis, s3 get/put/delete is not linearizable (is this normal?) 2023-10-18 12:03:11 +02:00
Alex Auvolat
70c1d3db46 better match exceptions 2023-10-18 12:03:11 +02:00
Alex Auvolat
bc11701999 jepsen: s3 gets and puts 2023-10-18 12:03:11 +02:00
Alex Auvolat
ca4cc7e44f jepsen connects to vagrant vms 2023-10-18 12:03:11 +02:00
Alex Auvolat
17ebb65273 jepsen ssh into containers seem to work ? 2023-10-18 12:03:11 +02:00
Alex Auvolat
7011b71fbd jepsen: wip 2023-10-18 12:03:11 +02:00
15 changed files with 980 additions and 0 deletions

View File

@ -0,0 +1 @@
use nix

16
script/jepsen.garage/.gitignore vendored Normal file
View File

@ -0,0 +1,16 @@
/target
/classes
/checkouts
profiles.clj
pom.xml
pom.xml.asc
*.jar
*.class
/.lein-*
/.nrepl-port
/.prepl-port
.hgignore
.hg/
.direnv
/store
.vagrant

View File

@ -0,0 +1,157 @@
# jepsen.garage
Jepsen checking of Garage consistency properties.
## Usage
Requirements:
- vagrant
- VirtualBox, configured so that nodes can take an IP in a private network `192.168.56.0/24`
- a user that can create VirtualBox VMs
- leiningen
- gnuplot
Set up VMs:
```
vagrant up
```
Run tests (this one should fail):
```
lein run test --nodes-file nodes.vagrant --time-limit 64 --concurrency 50 --rate 50 --workload reg
```
These ones are working:
```
lein run test --nodes-file nodes.vagrant --time-limit 64 --rate 50 --concurrency 50 --workload set1
lein run test --nodes-file nodes.vagrant --time-limit 64 --rate 50 --concurrency 50 --workload set2
```
## Results
### Register linear, without timestamp patch
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 20 --workload reg1 --ops-per-key 100`
Results without timestamp patch:
- Fails with a simple clock-scramble nemesis (`--scenario c`).
Explanation: without the timestamp patch, nodes will create objects using their
local clock only as a timestamp, so the ordering will be all over the place if
clocks are scrambled.
Results with timestamp patch (`--patch tsfix2`):
- No failure with clock-scramble nemesis
- Fails with clock-scramble nemesis + partition nemesis (`--scenario cp`).
**This test is expected to fail.**
Indeed, S3 objects are not meant to behave like linearizable registers.
TODO explain using a counter-example
### Read-after-write CRDT register model
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload reg2 --ops-per-key 100`
Results without timestamp patch:
- Fails with a simple clock-scramble nemesis (`--scenario c`).
Explanation: old values are not overwritten correctly when their timestamps are in the future.
Results with timestamp patch (`--patch tsfix2`):
- No failures with clock-scramble nemesis + partition nemesis (`--scenario cp`).
This proves that `tsfix2` (PR#543) does improve consistency.
- **Fails with layout reconfiguration nemesis** (`--scenario r`).
Example of a failed run: `garage reg2/20231024T120806.899+0200`.
This is the failure mode we are looking for and trying to fix for NLnet task 3.
- Changes brought by NLnet task 3 code (commit 707442f5de):
no failures with `--scenario r` (0 of 10 runs), `--scenario pr` (0 of 10 runs),
`--scenario cpr` (0 of 10 runs) and `--scenario dpr` (0 of 10 runs).
### Set, basic test (write some items, then read)
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 200 --concurrency 200 --workload set1 --ops-per-key 100 --patch tsfix2`
Results:
- For now, no failures with clock-scramble nemesis + partition nemesis -> TODO long test run
- Does not seem to fail with only the layout reconfiguation nemesis (<10 runs), although theoretically it could
- **Fails with the partition + layout reconfiguration nemesis** (`--scenario pr`).
Example of a failed run: `garage set1/20231024T172214.488+0200` (1 failure in 4 runs).
TODO: investigate.
This is the failure mode we are looking for and trying to fix for NLnet task 3.
### Set, continuous test (interspersed reads and writes)
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload set2 --ops-per-key 100 --patch tsfix2`
Results:
- No failures with clock-scramble nemesis + db nemesis + partition nemesis (`--scenario cdp`) (0 failures in 10 runs).
- **Fails with just layout reconfiguration nemesis** (`--scenario r`).
Example of a failed run: `garage set2/20231025T141940.198+0200` (10 failures in 10 runs).
This is the failure mode we are looking for and trying to fix for NLnet task 3.
- Changes brought by NLnet task 3 code (commit 707442f5de):
no failures with `--scenario r` (0 of 10 runs), `--scenario pr` (0 of 10 runs).
`--scenario cpr` (0 of 10 runs) and `--scenario dpr` (0 of 10 runs).
## Investigating (and fixing) errors
### Segfaults
They are due to the download being interrupted in the middle (^C during first launch on clean VMs), the `garage` binary is truncated.
Add `:force?` to the `cached-wget!` call in `daemon.clj` to re-download the binary.
### In `jepsen.garage`: prefix wierdness
In `store/garage set1/20231019T163358.615+0200`:
```
INFO [2023-10-19 16:35:20,977] clojure-agent-send-off-pool-207 - jepsen.garage.set list results for prefix set20/ : (set13/0 set13/1 set13/10 set13/11 set13/12 set13/13 set13/14 set13/15 set13/16 set13/17 set13/18 set13/19 set13/2 set13/20 set13/21 set13/22 set13/23 set13/24 set13/25 set13/26 set13/27 set13/28 set13/29 set13/3 set13/30 set13/31 set13/32 set13/33 set13/34 set13/35 set13/36 set13/37 set13/38 set13/39 set13/4 set13/40 set13/41 set13/42 set13/43 set13/44 set13/45 set13/46 set13/47 set13/48 set13/49 set13/5 set13/50 set13/51 set13/52 set13/53 set13/54 set13/55 set13/56 set13/57 set13/58 set13/59 set13/6 set13/60 set13/61 set13/62 set13/63 set13/64 set13/65 set13/66 set13/67 set13/68 set13/69 set13/7 set13/70 set13/71 set13/72 set13/73 set13/74 set13/75 set13/76 set13/77 set13/78 set13/79 set13/8 set13/80 set13/81 set13/82 set13/83 set13/84 set13/85 set13/86 set13/87 set13/88 set13/89 set13/9 set13/90 set13/91 set13/92 set13/93 set13/94 set13/95 set13/96 set13/97 set13/98 set13/99) (node: http://192.168.56.25:3900 )
```
After inspecting, the actual S3 call made was with prefix "set13/", so at least this is not an error in Garage itself but in the jepsen code.
Finally found out that this was due to closures not correctly capturing their context in the list function in s3api.clj (wtf clojure?)
Not sure exactly where it came from but it seems to have been fixed by making list-inner a separate function and not a sub-function,
and passing all values that were previously in the context (creds and prefix) as additional arguments.
### `reg2` test inconsistency, even with timestamp fix
The reg2 test is our custom checker for CRDT read-after-write on individual object keys, acting as registers which can be updated.
The test fails without the timestamp fix, which is expected as the clock scrambler will prevent nodes from having a correct ordering of objects.
With the timestamp fix (`--patch tsfix1`), the happenned-before relationship should at least be respected, meaning that when a PutObject call starts
after another PutObject call has ended, the second call should overwrite the value of the first call, and that value should not be
readable by future GetObject calls.
However, we observed inconsistencies even with the timestamp fix.
The inconsistencies seemed to always happenned after writing a nil value, which translates to a DeleteObject call
instead of a PutObject. By removing the possibility of writing nil values, therefore only doing
PutObject calls, the issue disappears. There is therefore an issue to fix in DeleteObject.
The issue in DeleteObject seems to have been fixed by commit `c82d91c6bccf307186332b6c5c6fc0b128b1b2b1`, which can be used using `--patch tsfix2`.
## License
Copyright © 2023 Alex Auvolat
This program and the accompanying materials are made available under the
terms of the GNU Affero General Public License v3.0.

32
script/jepsen.garage/Vagrantfile vendored Normal file
View File

@ -0,0 +1,32 @@
# -*- mode: ruby -*-
# vi: set ft=ruby :
#
def vm(config, hostname, ip)
config.vm.hostname = hostname
config.vm.network "private_network", ip: ip
end
Vagrant.configure("2") do |config|
config.vm.box = "generic/debian10"
config.vm.provider "virtualbox" do |vb|
vb.gui = false
vb.memory = "512"
vb.customize ["modifyvm", :id, "--vram=12"]
end
config.vm.provision "shell", inline: <<-SHELL
echo "root:root" | chpasswd
mkdir -p /root/.ssh
echo "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJpaBZdYxHqMxhv2RExAOa7nkKhPBOHupMP3mYaZ73w9 lx@lindy" >> /root/.ssh/authorized_keys
SHELL
config.vm.define "n1" do |config| vm(config, "n1", "192.168.56.21") end
config.vm.define "n2" do |config| vm(config, "n2", "192.168.56.22") end
config.vm.define "n3" do |config| vm(config, "n3", "192.168.56.23") end
config.vm.define "n4" do |config| vm(config, "n4", "192.168.56.24") end
config.vm.define "n5" do |config| vm(config, "n5", "192.168.56.25") end
config.vm.define "n6" do |config| vm(config, "n6", "192.168.56.26") end
config.vm.define "n7" do |config| vm(config, "n7", "192.168.56.27") end
end

View File

@ -0,0 +1,13 @@
docker stop jaeger
docker rm jaeger
# UI is on localhost:16686
# otel-grpc collector is on localhost:4317
# otel-http collector is on localhost:4318
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 4317:4317 \
-p 4318:4318 \
-p 16686:16686 \
jaegertracing/all-in-one:1.50

View File

@ -0,0 +1,7 @@
192.168.56.21
192.168.56.22
192.168.56.23
192.168.56.24
192.168.56.25
192.168.56.26
192.168.56.27

View File

@ -0,0 +1,10 @@
(defproject jepsen.garage "0.1.0-SNAPSHOT"
:description "Jepsen testing for Garage"
:url "https://git.deuxfleurs.fr/Deuxfleurs/garage"
:license {:name "AGPLv3"
:url "https://www.gnu.org/licenses/agpl-3.0.en.html"}
:main jepsen.garage
:dependencies [[org.clojure/clojure "1.11.1"]
[jepsen "0.3.3-SNAPSHOT"]
[amazonica "0.3.163"]]
:repl-options {:init-ns jepsen.garage})

View File

@ -0,0 +1,18 @@
{ pkgs ? import <nixpkgs> {
overlays = [
(self: super: {
jdk = super.jdk11;
jre = super.jre11;
})
];
} }:
pkgs.mkShell {
nativeBuildInputs = with pkgs; [
leiningen
jdk
jna
vagrant
gnuplot
graphviz
];
}

View File

@ -0,0 +1,101 @@
(ns jepsen.garage
(:require
[clojure.string :as str]
[jepsen
[checker :as checker]
[cli :as cli]
[generator :as gen]
[nemesis :as nemesis]
[tests :as tests]]
[jepsen.os.debian :as debian]
[jepsen.garage
[daemon :as grg]
[nemesis :as grgNemesis]
[reg :as reg]
[set :as set]]))
(def workloads
"A map of workload names to functions that construct workloads, given opts."
{"reg1" reg/workload1
"reg2" reg/workload2
"set1" set/workload1
"set2" set/workload2})
(def scenari
"A map of scenari to the associated nemesis"
{"c" grgNemesis/scenario-c
"cp" grgNemesis/scenario-cp
"r" grgNemesis/scenario-r
"pr" grgNemesis/scenario-pr
"cpr" grgNemesis/scenario-cpr
"cdp" grgNemesis/scenario-cdp
"dpr" grgNemesis/scenario-dpr})
(def patches
"A map of patch names to Garage builds"
{"default" "v0.9.0"
"tsfix1" "d146cdd5b66ca1d3ed65ce93ca42c6db22defc09"
"tsfix2" "c82d91c6bccf307186332b6c5c6fc0b128b1b2b1"
"task3a" "707442f5de416fdbed4681a33b739f0a787b7834"})
(def cli-opts
"Additional command line options."
[["-p" "--patch NAME" "Garage patch to use"
:default "default"
:validate [patches (cli/one-of patches)]]
["-s" "--scenario NAME" "Nemesis scenario to run"
:default "cp"
:validate [scenari (cli/one-of scenari)]]
["-r" "--rate HZ" "Approximate number of requests per second, per thread."
:default 10
:parse-fn read-string
:validate [#(and (number? %) (pos? %)) "Must be a positive number"]]
[nil "--ops-per-key NUM" "Maximum number of operations on any given key."
:default 100
:parse-fn parse-long
:validate [pos? "Must be a positive integer."]]
["-w" "--workload NAME" "Workload of test to run"
:default "reg1"
:validate [workloads (cli/one-of workloads)]]])
(defn garage-test
"Given an options map from the command line runner (e.g. :nodes, :ssh,
:concurrency, ...), constructs a test map."
[opts]
(let [garage-version (get patches (:patch opts))
db (grg/db garage-version)
workload ((get workloads (:workload opts)) opts)
scenario ((get scenari (:scenario opts)) (assoc opts :db db))]
(merge tests/noop-test
opts
{:pure-generators true
:name (str "garage " (name (:workload opts)) " " (name (:scenario opts)) " " (name (:patch opts)))
:os debian/os
:db db
:client (:client workload)
:generator (gen/phases
(->>
(:generator workload)
(gen/stagger (/ (:rate opts)))
(gen/nemesis (:generator scenario))
(gen/time-limit (:time-limit opts)))
(gen/log "Healing cluster")
(gen/nemesis (:final-generator scenario))
(gen/log "Waiting for recovery")
(gen/sleep 10)
(gen/clients (:final-generator workload)))
:nemesis (:nemesis scenario)
:checker (checker/compose
{:perf (checker/perf (:perf scenario))
:workload (:checker workload)})
})))
(defn -main
"Handles command line arguments. Can either run a test, or a web server for
browsing results."
[& args]
(cli/run! (merge (cli/single-test-cmd {:test-fn garage-test
:opt-spec cli-opts})
(cli/serve-cmd))
args))

View File

@ -0,0 +1,152 @@
(ns jepsen.garage.daemon
(:require [clojure.tools.logging :refer :all]
[jepsen [control :as c]
[core :as jepsen]
[db :as db]]
[jepsen.control.util :as cu]))
; CONSTANTS -- HOW GARAGE IS SET UP
(def base-dir "/opt/garage")
(def data-dir (str base-dir "/data"))
(def meta-dir (str base-dir "/meta"))
(def binary (str base-dir "/garage"))
(def logfile (str base-dir "/garage.log"))
(def pidfile (str base-dir "/garage.pid"))
(def admin-token "icanhazadmin")
(def access-key-id "GK8bfb6a51286071c6c9cd8bc3")
(def secret-access-key "b0be95f71c1c6f16858a9edf395078b75c12ecb6b1c03385c4ae92076e4994a3")
(def bucket-name "jepsen")
; THE GARAGE DB
(defn install!
"Download and install Garage"
[node version]
(c/su
(c/trace
(info node "installing garage" version)
(c/exec :mkdir :-p base-dir)
(let [url (str "https://garagehq.deuxfleurs.fr/_releases/" version "/x86_64-unknown-linux-musl/garage")
cache (cu/cached-wget! url)]
(c/exec :cp cache binary))
(c/exec :chmod :+x binary))))
(defn configure!
"Configure Garage"
[node]
(c/su
(c/trace
(cu/write-file!
(str "rpc_secret = \"0fffabe52542c2b89a56b2efb7dfd477e9dafb285c9025cbdf1de7ca21a6b372\"\n"
"rpc_bind_addr = \"0.0.0.0:3901\"\n"
"rpc_public_addr = \"" node ":3901\"\n"
"db_engine = \"lmdb\"\n"
"replication_mode = \"2\"\n"
"data_dir = \"" data-dir "\"\n"
"metadata_dir = \"" meta-dir "\"\n"
"[s3_api]\n"
"s3_region = \"us-east-1\"\n"
"api_bind_addr = \"0.0.0.0:3900\"\n"
"[k2v_api]\n"
"api_bind_addr = \"0.0.0.0:3902\"\n"
"[admin]\n"
"api_bind_addr = \"0.0.0.0:3903\"\n"
"admin_token = \"" admin-token "\"\n"
"trace_sink = \"http://192.168.56.1:4317\"\n")
"/etc/garage.toml"))))
(defn connect-node!
"Connect a Garage node to the rest of the cluster"
[test node]
(c/trace
(let [node-id (c/exec binary :node :id :-q)]
(info node "node id:" node-id)
(c/on-many (:nodes test)
(c/exec binary :node :connect node-id)))))
(defn configure-node!
"Configure a Garage node to be part of a cluster layout"
[test node]
(c/trace
(let [node-id (c/exec binary :node :id :-q)]
(c/on (jepsen/primary test)
(c/exec binary :layout :assign (subs node-id 0 16) :-c :1G :-z :dc1 :-t node)))))
(defn finalize-config!
"Apply the layout and create a key/bucket pair in the cluster"
[node]
(c/trace
(c/exec binary :layout :apply :--version 1)
(info node "garage status:" (c/exec binary :status))
(c/exec binary :key :import access-key-id secret-access-key :--yes)
(c/exec binary :bucket :create bucket-name)
(c/exec binary :bucket :allow :--read :--write bucket-name :--key access-key-id)
(info node "key info: " (c/exec binary :key :info access-key-id))))
(defn db
"Garage DB for a particular version"
[version]
(reify db/DB
(setup! [_ test node]
(install! node version)
(configure! node)
(cu/start-daemon!
{:logfile logfile
:pidfile pidfile
:chdir base-dir
:env {:RUST_LOG "garage=debug,garage_api=trace"}}
binary
:server)
(c/exec :sleep 3)
(jepsen/synchronize test)
(connect-node! test node)
(jepsen/synchronize test)
(configure-node! test node)
(jepsen/synchronize test)
(when (= node (jepsen/primary test))
(finalize-config! node)))
(teardown! [_ test node]
(info node "tearing down garage" version)
(c/su
(cu/stop-daemon! binary pidfile)
(c/exec :rm :-rf logfile)
(c/exec :rm :-rf data-dir)
(c/exec :rm :-rf meta-dir)))
db/Pause
(pause! [_ test node]
(cu/grepkill! :stop binary))
(resume! [_ test node]
(cu/grepkill! :cont binary))
db/Kill
(kill! [_ test node]
(cu/stop-daemon! binary pidfile))
(start! [_ test node]
(cu/start-daemon!
{:logfile logfile
:pidfile pidfile
:chdir base-dir
:env {:RUST_LOG "garage=debug,garage_api=trace"}}
binary
:server))
db/LogFiles
(log-files [_ test node]
[logfile])))
(defn creds
"Obtain Garage credentials for node"
[node]
{:access-key access-key-id
:secret-key secret-access-key
:endpoint (str "http://" node ":3900")
:bucket bucket-name
:client-config {:path-style-access-enabled true}})

View File

@ -0,0 +1,142 @@
(ns jepsen.garage.nemesis
(:require [clojure.tools.logging :refer :all]
[jepsen [control :as c]
[core :as jepsen]
[generator :as gen]
[nemesis :as nemesis]]
[jepsen.nemesis.combined :as combined]
[jepsen.garage.daemon :as grg]
[jepsen.control.util :as cu]))
; ---- reconfiguration nemesis ----
(defn configure-present!
"Configure node to be active in new cluster layout"
[test nodes]
(info "configure-present!" nodes)
(let [node-ids (c/on-many nodes (c/exec grg/binary :node :id :-q))
node-id-strs (map (fn [[_ v]] (subs v 0 16)) node-ids)]
(c/on
(jepsen/primary test)
(apply c/exec (concat [grg/binary :layout :assign :-c :1G] node-id-strs)))))
(defn configure-absent!
"Configure nodes to be active in new cluster layout"
[test nodes]
(info "configure-absent!" nodes)
(let [node-ids (c/on-many nodes (c/exec grg/binary :node :id :-q))
node-id-strs (map (fn [[_ v]] (subs v 0 16)) node-ids)]
(c/on
(jepsen/primary test)
(apply c/exec (concat [grg/binary :layout :assign :-g] node-id-strs)))))
(defn finalize-config!
"Apply the proposed cluster layout"
[test]
(let [layout-show (c/on (jepsen/primary test) (c/exec grg/binary :layout :show))
[_ layout-next-version] (re-find #"apply --version (\d+)\n" layout-show)]
(if layout-next-version
(do
(info "layout show: " layout-show "; next-version: " layout-next-version)
(c/on (jepsen/primary test)
(c/exec grg/binary :layout :apply :--version layout-next-version)))
(info "no layout changes to apply"))))
(defn reconfigure-subset
"Reconfigure cluster with only a subset of nodes"
[cnt]
(reify nemesis/Nemesis
(setup! [this test] this)
(invoke! [this test op] op
(case (:f op)
:start
(let [[keep-nodes remove-nodes]
(->> (:nodes test)
shuffle
(split-at cnt))]
(info "layout split: keep " keep-nodes ", remove " remove-nodes)
(configure-present! test keep-nodes)
(configure-absent! test remove-nodes)
(finalize-config! test)
(assoc op :value keep-nodes))
:stop
(do
(info "layout un-split: all nodes=" (:nodes test))
(configure-present! test (:nodes test))
(finalize-config! test)
(assoc op :value (:nodes test)))))
(teardown! [this test] this)))
; ---- nemesis scenari ----
(defn nemesis-op
"A generator for a single nemesis operation"
[op]
(fn [_ _] {:type :info, :f op}))
(defn reconfiguration-package
"Cluster reconfiguration nemesis package"
[opts]
{:generator (->>
(gen/mix [(nemesis-op :reconfigure-start)
(nemesis-op :reconfigure-stop)])
(gen/stagger (:interval opts 5)))
:final-generator {:type :info, :f :reconfigure-stop}
:nemesis (nemesis/compose
{{:reconfigure-start :start
:reconfigure-stop :stop} (reconfigure-subset 3)})
:perf #{{:name "reconfigure"
:start #{:reconfigure-start}
:stop #{:reconfigur-stop}
:color "#A197E9"}}})
(defn scenario-c
"Clock modifying scenario"
[opts]
(combined/clock-package {:db (:db opts), :interval 1, :faults #{:clock}}))
(defn scenario-cp
"Clock modifying + partition scenario"
[opts]
(combined/compose-packages
[(combined/clock-package {:db (:db opts), :interval 1, :faults #{:clock}})
(combined/partition-package {:db (:db opts), :interval 1, :faults #{:partition}})]))
(defn scenario-r
"Cluster reconfiguration scenario"
[opts]
(reconfiguration-package {:interval 1}))
(defn scenario-pr
"Partition + cluster reconfiguration scenario"
[opts]
(combined/compose-packages
[(combined/partition-package {:db (:db opts), :interval 1, :faults #{:partition}})
(reconfiguration-package {:interval 1})]))
(defn scenario-cpr
"Clock scramble + partition + cluster reconfiguration scenario"
[opts]
(combined/compose-packages
[(combined/clock-package {:db (:db opts), :interval 1, :faults #{:clock}})
(combined/partition-package {:db (:db opts), :interval 1, :faults #{:partition}})
(reconfiguration-package {:interval 1})]))
(defn scenario-cdp
"Clock modifying + db + partition scenario"
[opts]
(combined/compose-packages
[(combined/clock-package {:db (:db opts), :interval 1, :faults #{:clock}})
(combined/db-package {:db (:db opts), :interval 1, :faults #{:db :pause :kill}})
(combined/partition-package {:db (:db opts), :interval 1, :faults #{:partition}})]))
(defn scenario-dpr
"Db + partition + cluster reconfiguration scenario"
[opts]
(combined/compose-packages
[(combined/db-package {:db (:db opts), :interval 1, :faults #{:db :pause :kill}})
(combined/partition-package {:db (:db opts), :interval 1, :faults #{:partition}})
(reconfiguration-package {:interval 1})]))

View File

@ -0,0 +1,143 @@
(ns jepsen.garage.reg
(:require [clojure.tools.logging :refer :all]
[clojure.string :as str]
[clojure.set :as set]
[jepsen [checker :as checker]
[cli :as cli]
[client :as client]
[control :as c]
[db :as db]
[generator :as gen]
[independent :as independent]
[nemesis :as nemesis]
[util :as util]
[tests :as tests]]
[jepsen.checker.timeline :as timeline]
[jepsen.control.util :as cu]
[jepsen.os.debian :as debian]
[jepsen.garage.daemon :as grg]
[jepsen.garage.s3api :as s3]
[knossos.model :as model]
[slingshot.slingshot :refer [try+]]))
(defn op-get [_ _] {:type :invoke, :f :read, :value nil})
(defn op-put [_ _] {:type :invoke, :f :write, :value (str (rand-int 99))})
(defn op-del [_ _] {:type :invoke, :f :write, :value nil})
(defrecord RegClient [creds]
client/Client
(open! [this test node]
(assoc this :creds (grg/creds node)))
(setup! [this test])
(invoke! [this test op]
(try+
(let [[k v] (:value op)]
(case (:f op)
:read
(util/timeout
10000
(assoc op :type :fail, :error ::timeout)
(let [value (s3/get (:creds this) k)]
(assoc op :type :ok, :value (independent/tuple k value))))
:write
(util/timeout
10000
(assoc op :type :info, :error ::timeout)
(do
(s3/put (:creds this) k v)
(assoc op :type :ok)))))
(catch (re-find #"Unavailable" (.getMessage %)) ex
(assoc op :type :info, :error ::unavailable))
(catch (re-find #"Broken pipe" (.getMessage %)) ex
(assoc op :type :info, :error ::broken-pipe))
(catch (re-find #"Connection refused" (.getMessage %)) ex
(assoc op :type :info, :error ::connection-refused))))
(teardown! [this test])
(close! [this test]))
(defn reg-read-after-write
"Read-after-Write checker for register operations"
[]
(reify checker/Checker
(check [this test history opts]
(let [init {:put-values {-1 nil}
:put-done #{-1}
:put-in-progress {}
:read-can-contain {}
:bad-reads #{}}
final (reduce
(fn [state op]
(let [current-values (set/union
(set (map (fn [idx] (get (:put-values state) idx)) (:put-done state)))
(set (map (fn [[_ [idx _]]] (get (:put-values state) idx)) (:put-in-progress state))))
read-can-contain (reduce
(fn [rcc [idx v]] (assoc rcc idx (set/union current-values v)))
{} (:read-can-contain state))]
(info "--------")
(info "state: " state)
(info "current-values: " current-values)
(info "read-can-contain: " read-can-contain)
(info "op: " op)
(case [(:type op) (:f op)]
([:invoke :write])
(assoc state
:read-can-contain read-can-contain
:put-values (assoc (:put-values state) (:index op) (:value op))
:put-in-progress (assoc (:put-in-progress state) (:process op) [(:index op) (:put-done state)]))
([:ok :write])
(let [[index overwrites] (get (:put-in-progress state) (:process op))]
(assoc state
:read-can-contain read-can-contain
:put-in-progress (dissoc (:put-in-progress state) (:process op))
:put-done
(conj
(set/difference (:put-done state) overwrites)
index)))
([:invoke :read])
(assoc state
:read-can-contain (assoc read-can-contain (:process op) current-values))
([:ok :read])
(let [this-read-can-contain (get read-can-contain (:process op))
bad-reads (if (contains? this-read-can-contain (:value op))
(:bad-reads state)
(conj (:bad-reads state) [(:process op) (:index op) (:value op) this-read-can-contain]))]
(info "this-read-can-contain: " this-read-can-contain)
(assoc state
:read-can-contain (dissoc read-can-contain (:process op))
:bad-reads bad-reads))
state)))
init history)
valid? (empty? (:bad-reads final))]
(assoc final :valid? valid?)))))
(defn workload-common
"Common parts of workload"
[opts]
{:client (RegClient. nil)
:generator (independent/concurrent-generator
10
(range)
(fn [k]
(->>
(gen/mix [op-get op-put op-del])
(gen/limit (:ops-per-key opts)))))})
(defn workload1
"Tests linearizable reads and writes"
[opts]
(assoc (workload-common opts)
:checker (independent/checker
(checker/compose
{:linear (checker/linearizable
{:model (model/register)
:algorithm :linear})
:timeline (timeline/html)}))))
(defn workload2
"Tests CRDT reads and writes"
[opts]
(assoc (workload-common opts)
:checker (independent/checker
(checker/compose
{:reg-read-after-write (reg-read-after-write)
:timeline (timeline/html)}))))

View File

@ -0,0 +1,48 @@
(ns jepsen.garage.s3api
(:require [clojure.tools.logging :refer :all]
[jepsen [control :as c]]
[amazonica.aws.s3 :as s3]
[slingshot.slingshot :refer [try+]]))
; GARAGE S3 HELPER FUNCTIONS
(defn get
"Helper for GetObject"
[creds k]
(try+
(-> (s3/get-object creds (:bucket creds) k)
:input-stream
slurp)
(catch (re-find #"Key not found" (.getMessage %)) ex
nil)))
(defn put
"Helper for PutObject or DeleteObject (is a delete if value is nil)"
[creds k v]
(if (= v nil)
(s3/delete-object creds
:bucket-name (:bucket creds)
:key k)
(let [some-bytes (.getBytes v "UTF-8")
bytes-stream (java.io.ByteArrayInputStream. some-bytes)]
(s3/put-object creds
:bucket-name (:bucket creds)
:key k
:input-stream bytes-stream
:metadata {:content-length (count some-bytes)}))))
(defn list-inner [creds prefix ct accum]
(let [list-result (s3/list-objects-v2 creds
{:bucket-name (:bucket creds)
:prefix prefix
:continuation-token ct})
new-object-summaries (:object-summaries list-result)
new-objects (map (fn [d] (:key d)) new-object-summaries)
objects (concat new-objects accum)]
(if (:truncated? list-result)
(list-inner creds prefix (:next-continuation-token list-result) objects)
objects)))
(defn list
"Helper for ListObjects -- just lists everything in the bucket"
[creds prefix]
(list-inner creds prefix nil []))

View File

@ -0,0 +1,133 @@
(ns jepsen.garage.set
(:require [clojure.tools.logging :refer :all]
[clojure.string :as str]
[clojure.set :as set]
[jepsen [checker :as checker]
[cli :as cli]
[client :as client]
[control :as c]
[checker :as checker]
[db :as db]
[generator :as gen]
[independent :as independent]
[nemesis :as nemesis]
[util :as util]
[tests :as tests]]
[jepsen.checker.timeline :as timeline]
[jepsen.control.util :as cu]
[jepsen.os.debian :as debian]
[jepsen.garage.daemon :as grg]
[jepsen.garage.s3api :as s3]
[knossos.model :as model]
[slingshot.slingshot :refer [try+]]))
(defn op-add-rand100 [_ _] {:type :invoke, :f :add, :value (rand-int 100)})
(defn op-read [_ _] {:type :invoke, :f :read, :value nil})
(defrecord SetClient [creds]
client/Client
(open! [this test node]
(assoc this :creds (grg/creds node)))
(setup! [this test])
(invoke! [this test op]
(try+
(let [[k v] (:value op)
prefix (str "set" k "/")]
(case (:f op)
:add
(util/timeout
10000
(assoc op :type :info, :error ::timeout)
(do
(s3/put (:creds this) (str prefix v) "present")
(assoc op :type :ok)))
:read
(util/timeout
10000
(assoc op :type :fail, :error ::timeout)
(do
(let [items (s3/list (:creds this) prefix)]
(let [items-stripped (map (fn [o]
(assert (str/starts-with? o prefix))
(str/replace-first o prefix "")) items)
items-set (set (map parse-long items-stripped))]
(assoc op :type :ok, :value (independent/tuple k items-set))))))))
(catch (re-find #"Unavailable" (.getMessage %)) ex
(assoc op :type :info, :error ::unavailable))
(catch (re-find #"Broken pipe" (.getMessage %)) ex
(assoc op :type :info, :error ::broken-pipe))
(catch (re-find #"Connection refused" (.getMessage %)) ex
(assoc op :type :info, :error ::connection-refused))))
(teardown! [this test])
(close! [this test]))
(defn set-read-after-write
"Read-after-Write checker for set operations"
[]
(reify checker/Checker
(check [this test history opts]
(let [init {:add-started #{}
:add-done #{}
:read-must-contain {}
:missed #{}
:unexpected #{}}
final (reduce
(fn [state op]
(case [(:type op) (:f op)]
([:invoke :add])
(assoc state :add-started (conj (:add-started state) (:value op)))
([:ok :add])
(assoc state :add-done (conj (:add-done state) (:value op)))
([:invoke :read])
(assoc-in state [:read-must-contain (:process op)] (:add-done state))
([:ok :read])
(let [read-must-contain (get (:read-must-contain state) (:process op))
new-missed (set/difference read-must-contain (:value op))
new-unexpected (set/difference (:value op) (:add-started state))]
(assoc state
:read-must-contain (dissoc (:read-must-contain state) (:process op))
:missed (set/union (:missed state) new-missed),
:unexpected (set/union (:unexpected state) new-unexpected)))
state))
init history)
valid? (and (empty? (:missed final)) (empty? (:unexpected final)))]
(assoc final :valid? valid?)))))
(defn workload1
"Tests insertions and deletions"
[opts]
{:client (SetClient. nil)
:checker (independent/checker
(checker/compose
{:set (checker/set)
:timeline (timeline/html)}))
:generator (independent/concurrent-generator
10
(range 100)
(fn [k]
(->> (range)
(map (fn [x] {:type :invoke, :f :add, :value x}))
(gen/limit (:ops-per-key opts)))))
:final-generator (gen/phases
(independent/sequential-generator
(range 100)
(fn [k] (gen/once op-read)))
(gen/sleep 5))})
(defn workload2
"Tests insertions and deletions"
[opts]
{:client (SetClient. nil)
:checker (independent/checker
(checker/compose
{:set-read-after-write (set-read-after-write)
; :set-full (checker/set-full {:linearizable? false})
:timeline (timeline/html)}))
:generator (independent/concurrent-generator
10
(range)
(fn [k]
(->> (gen/mix [op-add-rand100 op-read])
(gen/limit (:ops-per-key opts)))))})

View File

@ -0,0 +1,7 @@
(ns jepsen.garage-test
(:require [clojure.test :refer :all]
[jepsen.garage :refer :all]))
(deftest a-test
(testing "FIXME, I fail."
(is (= 0 1))))