Commit Graph

10 Commits

Author SHA1 Message Date
IndeedNotJames
c229a6463e
nixos/tests/consul: stop consul cleanly
This should fix the flakyness of the test.

Forcefully killing the consul process can lead to
a broken `/var/lib/consul/node-id` file, which
will prevent consul from starting on that node again.
See https://github.com/hashicorp/consul/issues/3489

So instead of crashing the whole node, which leads to
this corruption from time to time, we kill the
networking instead, preventing any cluster
communication and then cleanly stop consul.
2023-03-22 19:18:34 +01:00
Niklas Hambüchen
b3b27ed008 consul.passthru.tests: Add 2 more tests 2020-06-18 03:06:24 +02:00
Niklas Hambüchen
bcdac2e2fd consul.passthru.tests: Refactor: Extract function 2020-06-18 03:05:54 +02:00
Niklas Hambüchen
811bcbe74a consul.passthru.tests: Use correct server health test.
From: https://github.com/hashicorp/consul/issues/8118#issuecomment-645330040
2020-06-18 02:49:27 +02:00
Niklas Hambüchen
701c0eb489 consul.passthru.tests: Refactor into functions.
For better naming and commentary.
2020-06-18 02:49:27 +02:00
Niklas Hambüchen
a59a972413 consul.passthru.tests: Fix failure on current consul. Fixes #90613.
Done by setting `autopilot.min_quorum = 3`.

Techncially, this would have been required to keep the test correct since
Consul's "autopilot" "Dead Server Cleanup" was enabled by default (I believe
that was in Consul 0.8). Practically, the issue only occurred with our NixOS
test with releases >= `1.7.0-beta2` (see #90613). The setting itself is
available since Consul 1.6.2.

However, this setting was not documented clearly enough for anybody to notice,
and only the upstream issue https://github.com/hashicorp/consul/issues/8118
I filed brought that to light.

As explained there, the test could also have been made pass by applying the
more correct rolling reboot procedure

    -m.wait_until_succeeds("[ $(consul members | grep -o alive | wc -l) == 5 ]")
    +m.wait_until_succeeds(
    +    "[ $(consul operator raft list-peers | grep true | wc -l) == 3 ]"
    +)

but we also intend to test that Consul can regain consensus even if
the quorum gets temporarily broken.
2020-06-18 02:22:31 +02:00
Niklas Hambüchen
25d665634a consul.passthru.tests: Refactor: Extract variable 2020-06-18 02:22:29 +02:00
Niklas Hambüchen
777d1c0944 consul.passthru.tests: Refactor let bindings 2020-06-18 02:22:26 +02:00
Niklas Hambüchen
f795df26cf consul.passthru.tests: Refactor: Extract variable 2020-06-18 02:22:23 +02:00
Niklas Hambüchen
dfa7042eaf nixosTests.consul: init 2019-12-06 03:39:28 +01:00