jepsen: reg2 failure seems to happen only with deleteobject

This commit is contained in:
Alex Auvolat 2023-10-20 13:36:48 +02:00
parent 4b93ce179a
commit d148b83d4f
3 changed files with 30 additions and 7 deletions

View File

@ -69,6 +69,8 @@ Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 -
Results: Results:
- Failures with clock-scramble nemesis + partition nemesis ???? TODO INVESTIGATE - Failures with clock-scramble nemesis + partition nemesis ???? TODO INVESTIGATE
-> the issue seems to be only after DeleteObject (deletions are not always taken into account),
the issue does not appear if we are using only PutObject with an actual object content
- TODO: layout reconfiguration nemesis - TODO: layout reconfiguration nemesis
@ -86,7 +88,7 @@ Results:
TODO TODO
## Investigating (and fixing) wierd behavior ## Investigating (and fixing) errors
### Segfaults ### Segfaults
@ -107,6 +109,22 @@ Finally found out that this was due to closures not correctly capturing their co
Not sure exactly where it came from but it seems to have been fixed by making list-inner a separate function and not a sub-function, Not sure exactly where it came from but it seems to have been fixed by making list-inner a separate function and not a sub-function,
and passing all values that were previously in the context (creds and prefix) as additional arguments. and passing all values that were previously in the context (creds and prefix) as additional arguments.
### `reg2` test inconsistency, even with timestamp fix
The reg2 test is our custom checker for CRDT read-after-write on individual object keys, acting as registers which can be updated.
The test fails without the timestamp fix, which is expected as the clock scrambler will prevent nodes from having a correct ordering of objects.
With the timestamp fix, the happenned-before relationship should at least be respected, meaning that when a PutObject call starts
after another PutObject call has ended, the second call should overwrite the value of the first call, and that value should not be
readable by future GetObject calls.
However, we observed inconsistencies even with the timestamp fix.
The inconsistencies seemed to always happenned after writing a nil value, which translates to a DeleteObject call
instead of a PutObject. By removing the possibility of writing nil values, therefore only doing
PutObject calls, the issue disappears. There is therefore an issue to fix in DeleteObject.
## License ## License
Copyright © 2023 Alex Auvolat Copyright © 2023 Alex Auvolat

View File

@ -20,10 +20,16 @@
"set1" set/workload1 "set1" set/workload1
"set2" set/workload2}) "set2" set/workload2})
(def patches
"A map of patch names to Garage builds"
{"default" "v0.9.0"
"tsfix1" "d146cdd5b66ca1d3ed65ce93ca42c6db22defc09"})
(def cli-opts (def cli-opts
"Additional command line options." "Additional command line options."
[["-I" "--increasing-timestamps" "Garage version with increasing timestamps on PutObject" [["-p" "--patch NAME" "Garage patch to use"
:default false] :default "default"
:validate [patches (cli/one-of patches)]]
["-r" "--rate HZ" "Approximate number of requests per second, per thread." ["-r" "--rate HZ" "Approximate number of requests per second, per thread."
:default 10 :default 10
:parse-fn read-string :parse-fn read-string
@ -41,9 +47,7 @@
:concurrency, ...), constructs a test map." :concurrency, ...), constructs a test map."
[opts] [opts]
(let [workload ((get workloads (:workload opts)) opts) (let [workload ((get workloads (:workload opts)) opts)
garage-version (if (:increasing-timestamps opts) garage-version (get patches (:patch opts))]
"d146cdd5b66ca1d3ed65ce93ca42c6db22defc09"
"v0.9.0")]
(merge tests/noop-test (merge tests/noop-test
opts opts
{:pure-generators true {:pure-generators true

View File

@ -112,7 +112,8 @@
(range) (range)
(fn [k] (fn [k]
(->> (->>
(gen/mix [op-get op-put op-del]) ; (gen/mix [op-get op-put op-del])
(gen/mix [op-get op-put])
(gen/limit (:ops-per-key opts)))))}) (gen/limit (:ops-per-key opts)))))})
(defn workload1 (defn workload1