Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update consensus protocol spec #9607

Merged
merged 1 commit into from
Dec 2, 2024
Merged

Update consensus protocol spec #9607

merged 1 commit into from
Dec 2, 2024

Conversation

arssher
Copy link
Contributor

@arssher arssher commented Nov 1, 2024

The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have.

ref #8699

@arssher arssher force-pushed the sk-tla branch 2 times, most recently from d96c133 to 900d77e Compare November 1, 2024 11:39
Copy link

github-actions bot commented Nov 1, 2024

7018 tests run: 6710 passed, 0 failed, 308 skipped (full report)


Flaky tests (3)

Postgres 17

  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64
  • test_scrubber_physical_gc_ancestors[2]: debug-x86-64

Postgres 14

Code coverage* (full report)

  • functions: 30.4% (8268 of 27227 functions)
  • lines: 47.7% (65191 of 136583 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
96b7d0c at 2024-12-02T16:10:08.553Z :recycle:

@arssher arssher marked this pull request as ready for review November 7, 2024 08:58
@arssher arssher requested a review from a team as a code owner November 7, 2024 08:58
@arssher arssher requested review from arpad-m and problame November 7, 2024 08:58
@arssher arssher force-pushed the sk-tla branch 2 times, most recently from bcbff08 to 396b908 Compare November 15, 2024 13:33
@problame problame self-assigned this Nov 15, 2024
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through everything reachable from Spec and Next.

Impressive work, very clearly readable and especially TypeOk is super helpful to understand what's going on.


This model doesn't include the membership change stuff, right? I suppose that'll be a separate PR on top of this one.


The modelcheck.sh doesn't work out of the box on Debian 12 with the toolbox .deb 1.7.4~20240808-1326

I had to comment out ALIAS.

--- a/safekeeper/spec/models/MCProposerAcceptorStatic_p2_a3_t2_l2.cfg
+++ b/safekeeper/spec/models/MCProposerAcceptorStatic_p2_a3_t2_l2.cfg
@@ -15,5 +15,5 @@ LogSafety
 CommittedNotTruncated
 SYMMETRY ProposerAcceptorSymmetry
 CHECK_DEADLOCK FALSE
-ALIAS Alias
+\* ALIAS Alias

Disclaimers:

  • I have very limited TLA+ experience. Reading is ok, but I wouldn't be able to write a spec like this one (yet :) )
  • I never trust myself with the relation of Lsn and WAL record.
    • Does it mean "up to the byte offset in the WAL"?
    • Does it mean "including the record that starts at byte offset X"
    • Does it mean "including the record that ends at byte offset X"
    • In this model, Lsn means just Nat index into the wal sequence of proposer and acceptor. So, it's side-stepping that question. Plus, TLA+ sequences start at 1.
    • So, overall, I wouldn't trust myself to certify that we're free of off-by-one errors in this spec.

safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the invariants, all look correct to me, left some comments.


Do I understand correctly that we don't check liveness?


Regarding the playing/debugging invariants: I would expect that the core spec file only contains

  • actions that are actually reachable by Spec (e.g. remove unreachable "code" like BadQuorum)
  • invariants that are universally true (e.g. remove CommittedNotTruncated)

Can we somehow turn BadQuorum into a unit test for the spec? Like, make Quorum a constant. Have all current configs use the correct implementation. And add a "negative test" config that uses BadQuorum and somewhere check that an invariant will be violated.

safekeeper/spec/MCProposerAcceptorStatic.tla Show resolved Hide resolved
safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
safekeeper/spec/ProposerAcceptorStatic.tla Outdated Show resolved Hide resolved
@arssher
Copy link
Contributor Author

arssher commented Nov 25, 2024

This model doesn't include the membership change stuff, right? I suppose that'll be a separate PR on top of this one.

Yes, working on it.

I had to comment out ALIAS.

Right, it is available only since pre-release 1.8.0; added comment on this.

In this model, Lsn means just Nat index into the wal sequence of proposer and acceptor. So, it's side-stepping that question.

I wouldn't say it is side stepping. Indeed, when we are talking about 'record LSN' it might either mean beginning of that record or end of it (which is the same as beginning of the next except alignment, and alignment is indeed side stepped here). So we must be careful to always specify which is of these is meant (and explain in comments), and that's important in the spec. For instance, FlushLsn is end of last entry, while term history entry contain LSN of the first record with this term.

@arssher
Copy link
Contributor Author

arssher commented Nov 25, 2024

Do I understand correctly that we don't check liveness?

Yes, I'm curious to do it, but in our case it is fairly trivial (when a proposer + majority of acceptors is alive and network is good), so not a high priority.

@arssher
Copy link
Contributor Author

arssher commented Nov 25, 2024

Can we somehow turn BadQuorum into a unit test for the spec?

I'm not aware of options which wouldn't need a lot of boilerplate. Generally I think the idea is that spec should be small / lightweight enough to be comfortably working with without unit tests, it's a bit too much.

@problame problame removed their assignment Nov 25, 2024
@arssher arssher enabled auto-merge December 2, 2024 14:46
@arssher arssher added this pull request to the merge queue Dec 2, 2024
Merged via the queue into main with commit fa909c2 Dec 2, 2024
80 checks passed
@arssher arssher deleted the sk-tla branch December 2, 2024 16:11
awarus pushed a commit that referenced this pull request Dec 5, 2024
The spec was written for the buggy protocol which we had before the one
more similar to Raft was implemented. Update the spec with what we
currently have.

ref #8699
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants