-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update consensus protocol spec #9607
Conversation
d96c133
to
900d77e
Compare
7018 tests run: 6710 passed, 0 failed, 308 skipped (full report)Flaky tests (3)Postgres 17
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
96b7d0c at 2024-12-02T16:10:08.553Z :recycle: |
bcbff08
to
396b908
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read through everything reachable from Spec
and Next
.
Impressive work, very clearly readable and especially TypeOk
is super helpful to understand what's going on.
This model doesn't include the membership change stuff, right? I suppose that'll be a separate PR on top of this one.
The modelcheck.sh
doesn't work out of the box on Debian 12 with the toolbox .deb 1.7.4~20240808-1326
I had to comment out ALIAS
.
--- a/safekeeper/spec/models/MCProposerAcceptorStatic_p2_a3_t2_l2.cfg
+++ b/safekeeper/spec/models/MCProposerAcceptorStatic_p2_a3_t2_l2.cfg
@@ -15,5 +15,5 @@ LogSafety
CommittedNotTruncated
SYMMETRY ProposerAcceptorSymmetry
CHECK_DEADLOCK FALSE
-ALIAS Alias
+\* ALIAS Alias
Disclaimers:
- I have very limited TLA+ experience. Reading is ok, but I wouldn't be able to write a spec like this one (yet :) )
- I never trust myself with the relation of Lsn and WAL record.
- Does it mean "up to the byte offset in the WAL"?
- Does it mean "including the record that starts at byte offset X"
- Does it mean "including the record that ends at byte offset X"
- In this model,
Lsn
means justNat
index into thewal
sequence of proposer and acceptor. So, it's side-stepping that question. Plus, TLA+ sequences start at 1. - So, overall, I wouldn't trust myself to certify that we're free of off-by-one errors in this spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the invariants, all look correct to me, left some comments.
Do I understand correctly that we don't check liveness?
Regarding the playing/debugging invariants: I would expect that the core spec file only contains
- actions that are actually reachable by
Spec
(e.g. remove unreachable "code" likeBadQuorum
) - invariants that are universally true (e.g. remove
CommittedNotTruncated
)
Can we somehow turn BadQuorum into a unit test for the spec? Like, make Quorum
a constant. Have all current configs use the correct implementation. And add a "negative test" config that uses BadQuorum
and somewhere check that an invariant will be violated.
Yes, working on it.
Right, it is available only since pre-release 1.8.0; added comment on this.
I wouldn't say it is side stepping. Indeed, when we are talking about 'record LSN' it might either mean beginning of that record or end of it (which is the same as beginning of the next except alignment, and alignment is indeed side stepped here). So we must be careful to always specify which is of these is meant (and explain in comments), and that's important in the spec. For instance, FlushLsn is end of last entry, while term history entry contain LSN of the first record with this term. |
Yes, I'm curious to do it, but in our case it is fairly trivial (when a proposer + majority of acceptors is alive and network is good), so not a high priority. |
I'm not aware of options which wouldn't need a lot of boilerplate. Generally I think the idea is that spec should be small / lightweight enough to be comfortably working with without unit tests, it's a bit too much. |
The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have. ref #8699
The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have.
ref #8699