-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement key synchronization. #32
Conversation
Looks like a good start. Initial thoughts: The In addition to reporting update failures, the leader should probably drop workers from its list if it can't update them, to handle instances that have failed. We probably also want some guards against stale nodes, especially since key rotation might happen no more than every few weeks. Maybe workers should re-register periodically, as a keep-alive against expiry from the leader's list. Likewise, workers should check their key material against the leader periodically and terminate if they haven't received an update. Maybe that could be combined into some sort of keep-alive ping? |
Quick summary of where we are with the PR:
What remains to be done:
Also, the scripts/ directory contains a few shell scripts that help with testing key synchronization locally. |
Are there advantages to having separate registration and heartbeat endpoints? If the initial registration request contained an empty body (or a hash of null key material), the leader could use the same logic to schedule a key exchange. When workers send subsequent registration requests with a current key hash, that could work the same as a heartbeat, updating the leader's list of workers without triggering an immediate keysync. |
Removing the "work-in-progress" because it no longer is. (cc @rillian, @DJAndries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all LGTM, few more non-blocking questions but the leader concern I had during the original feedback have been addressed in my opinion.
`hashed_keys` contains the Base64-encoded SHA-256 hash over the worker's enclave key material. | ||
If all goes well, the leader responds with status code `200 OK`. | ||
|
||
* `GET /enclave/leader?nonce={nonce}` Exposed by all enclaves, this endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocker: For the FQDN is there an internal DNS service running that's configured to point at the leader node as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no internal DNS service. For now, we assume that the domains for both leader and workers are public. That may change though, depending on how the Kubernetes tests are going to go.
return nil | ||
} | ||
|
||
// setupLeader performs necessary setup tasks like starting the worker event | ||
// loop and installing leader-specific HTTP handlers. | ||
func (e *Enclave) setupLeader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocker: What's the expected method of handing leadership over if the leader node goes down? From my current understanding if the leader falls over as long as it properly restarts then when the worker contacts it during the heartbeat check and determines it's using the wrong key and resync's.
Wouldn't we run into a sync issue between the heartbeat recheck and the leader restarting? Seems that intermediate state is a concern but is short enough (looks to be once a minute) that probably not worth being concerned about. Is that aligned with your view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that aligned with your view?
Yes. Heartbeats are cheap and we can increase their frequency if this turns out to be a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ready to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new commits also look good
Let's merge and address future issues in subsequent PRs. |
Resolves #10