Implement key synchronization. #32

NullHypothesis · 2023-08-08T17:41:11Z

Resolves #10

main.go

rillian · 2023-08-08T20:06:03Z

Looks like a good start. Initial thoughts:

The attester interface seems generally useful. Could land it separately to reduce the size of this PR.

In addition to reporting update failures, the leader should probably drop workers from its list if it can't update them, to handle instances that have failed. We probably also want some guards against stale nodes, especially since key rotation might happen no more than every few weeks. Maybe workers should re-register periodically, as a keep-alive against expiry from the leader's list. Likewise, workers should check their key material against the leader periodically and terminate if they haven't received an update. Maybe that could be combined into some sort of keep-alive ping?

util.go

NullHypothesis · 2023-08-16T14:53:00Z

Quick summary of where we are with the PR:

From the leader's PoV:
- Upon receiving a worker's registration, the leader immediately initiates key synchronization.
- Upon receiving a worker's heartbeat, the leader updates the worker's "last seen" timestamp and lets the worker know if its key material is up-to-date.
- Upon receiving new keys from star-randsrv, the leader immediately re-synchronizes with all registered workers.
- If a given worker hasn't sent a heartbeat in X minutes, the leader logs an error and removes it from the worker pool.
From the worker's PoV:
- Immediately after bootstrapping, the worker registers itself with the leader.
- Every X minutes, the worker sends a heartbeat to the leader, containing a hash over its key material. If the leader signals that the key material is outdated, the worker re-registers itself.
- If key synchronization fails, the worker terminates.
- If the leader is temporarily unavailable for the heartbeat, the worker logs an error.

What remains to be done:

Test key sync in the context of k8s.
Log important errors via Prometheus, so we know when there are sync issues.
Write more tests.
Provide a mechanism that lets star-randsrv know when its keys were updated.

Also, the scripts/ directory contains a few shell scripts that help with testing key synchronization locally.

rillian · 2023-08-16T17:12:53Z

Are there advantages to having separate registration and heartbeat endpoints? If the initial registration request contained an empty body (or a hash of null key material), the leader could use the same logic to schedule a key exchange. When workers send subsequent registration requests with a current key hash, that could work the same as a heartbeat, updating the leader's list of workers without triggering an immediate keysync.

NullHypothesis · 2023-09-11T12:39:40Z

Removing the "work-in-progress" because it no longer is. (cc @rillian, @DJAndries)

util.go

handlers.go

sync_worker.go

util.go

kdenhartog

All in all LGTM, few more non-blocking questions but the leader concern I had during the original feedback have been addressed in my opinion.

kdenhartog · 2023-09-12T03:35:21Z

doc/http-api.md

+  `hashed_keys` contains the Base64-encoded SHA-256 hash over the worker's enclave key material.
+  If all goes well, the leader responds with status code `200 OK`.
+
+* `GET /enclave/leader?nonce={nonce}` Exposed by all enclaves, this endpoint


non-blocker: For the FQDN is there an internal DNS service running that's configured to point at the leader node as well?

There's no internal DNS service. For now, we assume that the domains for both leader and workers are public. That may change though, depending on how the Kubernetes tests are going to go.

kdenhartog · 2023-09-12T03:51:37Z

enclave.go

 	return nil
 }

+// setupLeader performs necessary setup tasks like starting the worker event
+// loop and installing leader-specific HTTP handlers.
+func (e *Enclave) setupLeader() {


non-blocker: What's the expected method of handing leadership over if the leader node goes down? From my current understanding if the leader falls over as long as it properly restarts then when the worker contacts it during the heartbeat check and determines it's using the wrong key and resync's.

Wouldn't we run into a sync issue between the heartbeat recheck and the leader restarting? Seems that intermediate state is a concern but is short enough (looks to be once a minute) that probably not worth being concerned about. Is that aligned with your view?

Is that aligned with your view?

Yes. Heartbeats are cheap and we can increase their frequency if this turns out to be a concern.

rillian

Seems ready to land.

rillian

new commits also look good

NullHypothesis · 2023-09-25T21:57:10Z

Let's merge and address future issues in subsequent PRs.

NullHypothesis added 3 commits August 4, 2023 11:08

Implement mechanism for leader designation.

990abad

Implement mechanism to register worker enclaves.

718214e

Add handlers for key (re-)synchronization.

b19e1f7

github-advanced-security bot found potential problems Aug 8, 2023

View reviewed changes

main.go Fixed Show fixed Hide fixed

Philipp Winter added 6 commits August 11, 2023 10:50

Improve key sync protocol.

e724427

Move log message to the correct place.

904468f

Remove debug code.

f8f6e89

Improve debug messages.

efeeb8b

Improve registration process.

0b35752

Refactor and test key synchronization.

75c059a

github-advanced-security bot found potential problems Aug 15, 2023

View reviewed changes

util.go Fixed Show fixed Hide fixed

Philipp Winter added 8 commits August 15, 2023 09:38

Fix proxy.go

2aa6d45

Add crude heartbeat mechanism.

bc8eb82

Revise heartbeat mechanism.

f656393

Polish heartbeat mechanism.

099e2c9

Make worker initiate re-synchronization.

9f444ca

Refactoring.

c72be54

Add scripts for testing sync outside of enclaves.

cb6b360

Update test scripts.

060ca35

Philipp Winter added 7 commits August 16, 2023 10:01

Remove unused constant.

e9a4fb9

Terminate if enclave keys cannot be installed.

4c79050

Remove annoying error messages.

d306826

Address linter error.

be940b9

Add log message.

4cd624a

Fix bug.

6303f03

Actually fix bug this time.

269ec4d

Improve test coverage.

3eb556a

Philipp Winter added 7 commits September 11, 2023 07:11

Remove comment.

fe82053

Fix comment.

2091010

Use sync.Mutex instead.

cd09cee

Only expose endpoint if necessary.

44487e6

Expose Prometheus metrics for heartbeats.

138580f

Minor improvements to clarity.

1d6f59f

Remove unnecessary newline.

0fe0922

NullHypothesis requested a review from DJAndries September 11, 2023 12:38

NullHypothesis marked this pull request as ready for review September 11, 2023 12:39

NullHypothesis changed the title ~~WIP: Implement key synchronization.~~ Implement key synchronization. Sep 11, 2023

Update docs to match protocol.

f1594eb

github-actions bot reviewed Sep 11, 2023

View reviewed changes

util.go Outdated Show resolved Hide resolved

util.go Outdated Show resolved Hide resolved

handlers.go Show resolved Hide resolved

sync_worker.go Show resolved Hide resolved

util.go Show resolved Hide resolved

util.go Show resolved Hide resolved

util.go Show resolved Hide resolved

github-actions bot added the needs-security-review label Sep 11, 2023

github-actions bot assigned bcaller and thypon Sep 11, 2023

Limit the # of bytes we're willing to read.

eb37930

thypon removed the needs-security-review label Sep 11, 2023

thypon unassigned thypon and bcaller Sep 11, 2023

kdenhartog approved these changes Sep 12, 2023

View reviewed changes

kdenhartog mentioned this pull request Sep 14, 2023

Cleanup if removed needs-security-action brave/security-action#254

Open

rillian previously approved these changes Sep 14, 2023

View reviewed changes

Register heartbeat metrics with Prometheus.

94a9d73

NullHypothesis dismissed rillian’s stale review via 94a9d73 September 25, 2023 17:37

Also run 'go vet' and govulncheck.

a5fd697

rillian approved these changes Sep 25, 2023

View reviewed changes

NullHypothesis merged commit ea48d48 into master Sep 25, 2023
4 checks passed

NullHypothesis deleted the key-sync branch September 25, 2023 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement key synchronization. #32

Implement key synchronization. #32

NullHypothesis commented Aug 8, 2023 •

edited by rillian

Loading

rillian commented Aug 8, 2023 •

edited

Loading

NullHypothesis commented Aug 16, 2023 •

edited

Loading

rillian commented Aug 16, 2023

NullHypothesis commented Sep 11, 2023

kdenhartog left a comment

kdenhartog Sep 12, 2023 •

edited

Loading

NullHypothesis Sep 13, 2023

kdenhartog Sep 12, 2023

NullHypothesis Sep 13, 2023

rillian left a comment

rillian left a comment

NullHypothesis commented Sep 25, 2023

Implement key synchronization. #32

Implement key synchronization. #32

Conversation

NullHypothesis commented Aug 8, 2023 • edited by rillian Loading

rillian commented Aug 8, 2023 • edited Loading

NullHypothesis commented Aug 16, 2023 • edited Loading

rillian commented Aug 16, 2023

NullHypothesis commented Sep 11, 2023

kdenhartog left a comment

Choose a reason for hiding this comment

kdenhartog Sep 12, 2023 • edited Loading

Choose a reason for hiding this comment

NullHypothesis Sep 13, 2023

Choose a reason for hiding this comment

kdenhartog Sep 12, 2023

Choose a reason for hiding this comment

NullHypothesis Sep 13, 2023

Choose a reason for hiding this comment

rillian left a comment

Choose a reason for hiding this comment

rillian left a comment

Choose a reason for hiding this comment

NullHypothesis commented Sep 25, 2023

NullHypothesis commented Aug 8, 2023 •

edited by rillian

Loading

rillian commented Aug 8, 2023 •

edited

Loading

NullHypothesis commented Aug 16, 2023 •

edited

Loading

kdenhartog Sep 12, 2023 •

edited

Loading