From 515a9fd665e9a214054c2cc892f14d7226baf755 Mon Sep 17 00:00:00 2001 From: Allen Ray Date: Fri, 16 Sep 2022 13:30:17 -0400 Subject: [PATCH 1/2] ETCD-318: Adding MicroShift etcd enhancement --- .../etcd-as-a-transient-systemd-unit.md | 208 ++++++++++++++++++ 1 file changed, 208 insertions(+) create mode 100644 enhancements/microshift/etcd-as-a-transient-systemd-unit.md diff --git a/enhancements/microshift/etcd-as-a-transient-systemd-unit.md b/enhancements/microshift/etcd-as-a-transient-systemd-unit.md new file mode 100644 index 0000000000..de83861761 --- /dev/null +++ b/enhancements/microshift/etcd-as-a-transient-systemd-unit.md @@ -0,0 +1,208 @@ +--- +title: etcd-as-a-transient-systemd-unit +authors: + - dusk125 + - hasbro17 +reviewers: + - "@tjungblu, etcd Team" + - "@Elbehery, etcd Team" + - "@fzdarsky" + - "@deads2k" + - "@derekwaynecarr" + - "@mangelajo" + - "@pmtk" +approvers: + - "@dhellmann" +api-approvers: + - None +creation-date: 2022-09-16 +last-updated: 2022-10-19 +tracking-link: + - https://issues.redhat.com/browse/ETCD-318 +#see-also: +# - "/enhancements/this-other-neat-thing.md" +--- + +# Decoupling etcd from the MicroShift binary + +## Summary + +The enhancement proposes moving etcd out of the MicroShift binary and into a transient systemd unit that is launched and managed by MicroShift. + +## Motivation + +Currently MicroShift [bundles upstream etcd](https://github.com/openshift/microshift/blob/f7f2260e4fbff61654f478fc149cb7052261f87f/go.mod#L21) into its binary and runs it as a goroutine. +This results in etcd dependencies being coupled with MicroShift which makes it harder to maintain and upgrade etcd independently of the platform like it is done with a separate openshift/etcd repo in the Openshift Container Platform (OCP). + +This enhancement outlines an update to MicroShift to change how etcd is bundled and deployed with MicroShift in order to minimize risk of shared dependencies between etcd and OpenShift/Kubernetes leading to hard-to-trace bugs. + +Changing the deployment of etcd, outlined below, would alleviate concerns about building etcd with a version of shared dependencies - mainly grpc - that etcd is not currently built against, and therefore, not tested against. + +Additionally moving the etcd deployment into its own process would make it easier to debug and process logs via journatlctl. + +### User Stories + +1. "As an etcd developer, I want to maintain etcd versions independently of microshift dependencies." + +2. "As a MicroShift Device Administrator, I want to observe and debug the etcd server logs separately from other microshift processes." + +### Goals + +- The etcd binary is built as a separate binary to ease dependency management and shipped together with MicroShift binary in single RPM to ease deployment process. +- The etcd server logs are easier to observe. + +### Non-Goals + +- Certificate management for etcd client authentication needs to be updated regardless of the delivery/execution implementation so this will be out of scope for this enhancement. +- Creating an RPM build root specifically for MicroShift-etcd; this should be eventually needed so we're building etcd with the Golang version it's expecting (1.16), instead of the same that MicroShift is built with (1.19). + - This enhancement discusses this further in the 'etcd Binary from an RPM' section. + - This work should be done in the future, but is out of scope for this enhancement. + +## Proposal + +### Running etcd as a transient systemd unit +In order to address the concerns brought up in the Motivation section, the MicroShift binary will need to be updated to remove the currently bundled etcd server and replaced with a forked process running under systemd; using the `systemd-run` command. Only the execution of etcd will change - from goroutine to external process - MicroShift's general usage of etcd will not be changed. + +Additionally, this external process will need to be shutdown when MicroShift exits. In an ungraceful termination of MicroShift, it is possible that the etcd systemd unit will continue execution, so it may be necessary to add code to detect this on MicroShift startup so that another etcd launch is not attempted: this would be a port conflict and cause the second etcd instance to fail to start. +Further investigation is needed to learn if a transient systemd unit's lifecycle can be tied to another non-transient systemd unit (MicroShift) so that if the latter exits ungracefully, the former will be gracefully shutdown. + +This etcd systemd unit will also need to be provided with the necessary CA certs and signed key pairs for both server and client authentication. + +There should be a mechanism to allow for running the binary more directly in a development environment; one that potentially bypasses the scenarios below. + +The actual execution of etcd in the transient systemd unit will also be considered in this enhancement; there are currently two scenarios (in detail below): running the etcd container image under podman, and running the etcd binary installed via an RPM. + +#### etcd Binary from an RPM +This will lay the etcd binary down into the same directory as the MicroShift binary so that in a development environment a local build of etcd will be used. + +The RPM will be built in the same build root as MicroShift as a go submodule. This will resolve the dependency issues, but will cause etcd to be built with an unexpected version of Golang - Upstream/OpenShift etcd expects Go1.16, MicroShift is built with Go1.19. +While this is not ideal, decoupling etcd into its own process to resolve the dependency conflict is the first step and getting the etcd Go build version aligned back to 1.16 can be a future goal. + +##### RPM Pros +* This scenario should incur the least amount of non-etcd-related overhead since the binary is being run directly. +* Integration is straight-forward; the etcd rpm is another MicroShift dependency, embedded into the rpm-ostree and distributed the same way as other dependencies. +* Clear, consistent architecture: There is only either RPM-installed content or content hosted on the cluster - no third way of installing/running things. + +##### RPM Cons +* New plumbing will need to be created to (or potentially existing will need to be updated/refurbished to) build etcd and package it into an RPM; addressing this issue is outside the scope of this enhancement. +* etcd will need to be built with a newer version of Golang (1.19) than is currently expected from upstream (1.16); etcd will be built under the MicroShift build root, locking the Golang version. + +### Workflow Description + +For the end user, starting and stopping MicroShift (and etcd) won't change due to this enhancement. They will run MicroShift with `systemctl start microshift`; as a part of the MicroShift boot up, etcd will be automatically brought up as well (using `systemd-run etcd`). +The execution lifetime of the etcd service will be tied to that of MicroShift, so it will be automatically stopped if MicroShift is stopped by the user `systemctl stop microshift` and if MicroShift has an unexpected shutdown. +The execution of etcd should be completely transparent to the user; they would be able to see it running under systemd `systemctl status microshift-etcd` and collect logs from it `journalctl -u microshift-etcd`. + +#### Variation [optional] + +For the developer who wishes to run and debug MicroShift, microshift-etcd will detect that MicroShift is being run locally (not from a systemd unit) and change its execution of etcd to a direct binary execution; it will expect to find the etcd binary in the same directory as the MicroShift binary. This will allow the developer to build and debug both MicroShift and etcd locally. + +### API Extensions + +N/A + +### Implementation Details/Notes/Constraints [optional] + +We have agreed on a multi-step approach, starting with the process management change (this enhancements change), then adding etcd, in a go submodule, to the MicroShift repository. Once this is in place, we might move the etcd build out to its own RPM, but that is outside of the scope for this enhancement. +This stepped approach will help keep these fundamental changes more manageable for reviewers, and will help decrease the likelihood of disruption due to these changes. + +### Risks and Mitigations + +The etcd execution lifetime should be bound to that of MicroShift (through systemd 'BindsTo' property); however, it may be possible that MicroShift exits and etcd is left running. +If this occurs, the user can also bring down etcd with `systemctl stop microshift-etcd`; if we find instances of this happening, we can add a check to the microshift-etcd startup to shutdown a running instance of microshift-etcd, if it exists. + +### Drawbacks + +We will have a more complicated build and an extra package to manage. We consider those acceptable tradeoffs to be able to deliver etcd fixes quickly and to decouple the dependencies that are causing MicroShift build issues. + +## Design Details + +We will need to update MicroShift's automated rebase process to include the etcd go submodule as a seperate step. + +### Open Questions + +None + +### Test Plan + +**Note:** *Section not required until targeted at a release.* + +In addition to standard end-to-end tests, we should also test for the main MicroShift binary dying/being killed and ensuring that etcd also comes down. This could be as simple as starting MicroShift via systemd, getting its pid, killing that pid, checking `systemctl status microshift-etcd` and ensuring it comes down. + +Another test could be if etcd has an ungraceful shutdown; do the same steps as above, but using microshift-etcd's pid. Etcd should automatically come back up via the same systemd unit. + +### Graduation Criteria + +TODO + +#### Dev Preview -> Tech Preview + +TODO + +#### Tech Preview -> GA + +TODO + +#### Removing a deprecated feature + +TODO + +### Upgrade / Downgrade Strategy + +TODO + +### Version Skew Strategy + +Currently, there is no need for a version skew strategy as etcd will be built and delivered as a part of the MicroShift RPM, so it will not be possible for the MicroShift and etcd versions to get out of sync. + +This may need to be revisited if we build and deliver etcd in its own RPM, but that is out of scope for this enhancement. + +### Operational Aspects of API Extensions + +N/A + +#### Failure Modes + +TODO + +#### Support Procedures + +##### Reading etcd Logs +With this enhancement, etcd logs will no longer show up in the MicroShift log stream, they will be in their own systemd log stream. The user/support can get these logs with `journalctl -u microshift-etcd` or similar command used for MicroShift itself. + +##### Backup and Restore +Backup and restore of etcd may be different since there is no Cluster Etcd Operator; in OCP, the CEO is what handles the backup and restore of etcd. However, since etcd will not be running in a container, an admin could just run the `etcdctl snapshot save` command directly and etcd should snapshot like in OCP. + +More investigation should be done for what pieces from the etcd [backup](https://github.com/openshift/cluster-etcd-operator/blob/2272cc785bcba7a5b84c015481705e0dbe64cf8c/bindata/etcd/cluster-backup.sh) and [restore](https://github.com/openshift/cluster-etcd-operator/blob/2272cc785bcba7a5b84c015481705e0dbe64cf8c/bindata/etcd/cluster-restore.sh) scripts are needed. + +## Implementation History + +TODO + +## Alternatives + +### etcd Logs +If we decide not to run etcd in a transient systemd unit, we'll need to make updates to the etcd Zap logger to have it write out logs in a consistent format to the other modules in MicroShift - currently, etcd writes logs in JSON format. +* If this isn't supported with current etcd logger configuration, we may have to patch our downstream logger to achieve this. + +### etcd as Go Plugin +Compile etcd into a go plugin and continue to execute it in a gorountine. This would allow for a separate build chain for the binary, but does not change how etcd runs. + +This would not solve the dependency issue as you can still only have one version of each dependency otherwise the symbols would conflict. Also, the plugin would need to be built with the same runtime version of Go as MicroShift. + +### etcd Container Image with Podman +This scenario would run the etcd server as a container executed via a `podman` command. +This was rejected and moved to an alternative because it would incur an additional, large, dependency on the system and on the customer. + +#### Podman Pros +* This would be similar to how etcd is built and shipped for OCP and would reuse the existing build machinery for openshift/etcd. + +#### Podman Cons +* The current etcd image is too large, ~400MB, either it will need to be shrunk or a new image will need to be created with the bare minimum in it; the etcd binary alone is about 30MB. +* This scenario places a new runtime dependency on podman; it would need to be installed and ready before MicroShift could be used. +* MicroShift and its dependencies (incl. microshift-networking) are deliberately delivered as RPMs, so they natively fit into customers' build pipeline and content delivery for the RHEL4Edge rpm-ostrees they're building. Introducing non-RPM dependencies for MicroShift will complicate this. +* A (small) risk of conflicts when customers use their own management agent for managing Podman workloads. + +## Infrastructure Needed [optional] + +TODO From c3038accfb0436a629463aa5e6cff332d2bf0d73 Mon Sep 17 00:00:00 2001 From: Allen Ray Date: Wed, 3 May 2023 11:11:54 -0400 Subject: [PATCH 2/2] ETCD-425: adding etcd tuning profiles enhancement --- enhancements/etcd/etcd-tuning-profiles.md | 184 ++++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 enhancements/etcd/etcd-tuning-profiles.md diff --git a/enhancements/etcd/etcd-tuning-profiles.md b/enhancements/etcd/etcd-tuning-profiles.md new file mode 100644 index 0000000000..471a8ba349 --- /dev/null +++ b/enhancements/etcd/etcd-tuning-profiles.md @@ -0,0 +1,184 @@ +--- +title: etcd-tuning-profiles +authors: + - "@dusk125" +reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect" + - "@hasbro7, etcd team" + - "@tjungblu, etcd team" + - "@williamcaban, Openshift product manager" + - "@deads2k, implemented a similar feature for API server" +approvers: # A single approver is preferred, the role of the approver is to raise important questions, help ensure the enhancement receives reviews from all applicable areas/SMEs, and determine when consensus is achieved such that the EP can move forward to implementation. Having multiple approvers makes it difficult to determine who is responsible for the actual approval. + - "@hasbro7, etcd team" +api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None" + - None +creation-date: 2023-05-16 +last-updated: 2023-09-21 +tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement + - https://issues.redhat.com/browse/ETCD-425 +--- + +# etcd Tuning Profiles + +## Summary + +This enhancement would replace the hardcoded values for the etcd parameters HEARTBEAT_INTERVAL and LEADER_ELECTION_TIMEOUT with predefined "profiles". +Each profile would map to predefined, and pretested, values for the internal etcd parameters. +This would allow for some user tweaking without giving them access to the full range of values. +This enhancement only covers the mvp for a tech preview release of this new feature; a future enhancement will be necessary. + +## Motivation + +Customers have asked for the ability to change the etcd heartbeat interval and leader election timeout to aid in the stability of their clusters. +We want to be able to allow for this while minimizing the risk of them setting values that cause issues. +This will also remove the hardcoded platform-dependent values; certain platforms could have different default profiles to maintain backwards compatibilty with the currently hardcoded values. + +### User Stories + +* As an adminstrator, I want to change etcd tuning profiles to help increase the stability of my cluster and understand the performance/latency cost of the profile change. +* As an Openshift support, I want to walk a customer through changing the active etcd profile in a minimal number of steps. +* As an etcd engineer, I want to easily add and test new profiles and internal profile parameters. + +### Goals + +* Add profiles that map to the existing values. +* Remove the parameters from the podspec rendering and replace with the Profile. +* Add an API to allow admins to change the profile. + +### Non-Goals + +* Adding more profiles beyond those listed above. +* Handle consuming profile changes without an etcd rollout. +* Allow users to set arbitrary values for the profile parameters. + +## Proposal + +The profiles are a layer of abstraction that allow a customer to tweak etcd to run more reliably on their system, while not being so open as to allow them to easily harm their cluster by (knowingly or not) setting bad values for them. +The default profile ("" or unset) will allow for upgrades to this feature as it tells the system to choose the values based on the platform: this is the current behavior. +The values for the two proposed profiles for the Tech Preview of this feature are the current default values (applied to all platform except for) and the values applied for Azure and IBMCloud VPC. +The latter values have been used successfully in the field for some time so the risk to future cluster is minimal. +Changing to a "slower" profile will likely incur a performance/latency penalty, but that is likely an acceptable trade for cluster stability. + +In this iteration, for the Tech Preview, we will make it clear that changing the profile will require an etcd redeployment. +In the future enhancement, we can discuss a more seamless transition between profiles. +We will not allow the user to set arbitrary values for the parameters, they must conform to the profiles values (by way of the profile). + +The active profile will be set via the API Server, then an etcd rollout will be triggered automatically by the Cluster Etcd Operator env var controller to consume the new profile. + +The entry for the profile will be added to the operator/v1 etcd operator config crd in the API server, named ControlPlaneHardwareSpeed to allow for other, non-etcd, components to map their own configuration based on the set profile in the future. + +The profiles that will be added are: +* Default (""): + - HEARTBEAT_INTERVAL: Platform dependent + - LEADER_ELECTION_TIMEOUT: Platform dependent +* Standard: + - HEARTBEAT_INTERVAL: 100ms + - LEADER_ELECTION_TIMEOUT: 1000ms +* "Slower" (or other name): + - HEARTBEAT_INTERVAL: 500ms + - LEADER_ELECTION_TIMEOUT: 2500ms + +These profiles are based on the current platform independent, hard-coded values. All platforms are on 'Standard' except for Azure and IBMCloud VPC, which are on 'Slower'; these profiles will be set when the Default profile is set. + +### Workflow Description + +1. The cluster administrator decides to change the etcd profile from default to slower, or slower to default. +2. They set the new profile in the API server. +3. They force an etcd redeployment which restarts the etcd pods which consume the new profile value. + +If the profile value is not valid, the API should fail to accept the value and return an error. + +#### Variation [optional] + +None + +### API Extensions + +None + +### Implementation Details/Notes/Constraints [optional] + +There will still be hardcoded values for each of the parameters and their mappings. +We can reuse the functions in the Cluster Etcd Operator that retrieve the values, it would just be reading the environment variable for the profile, looking up the mapping, and returning the value. + +### Risks and Mitigations + +We will need to test the latency of the API server on platforms that currently use the default parameters to see if there are any issues with running slower values. +If there is an issue, we could use slightly different values than proposed above; both Azure and IBMCloud VPC were originally meant to be temporary values to compensate for lacking IOPs. + +### Drawbacks + +* There will be a required etcd rollout when changing profiles, this is to avoid the edge case where the etcd pods have different timeouts/heartbeats while we roll out the profile change. +* Because this is a compromise of configurability and testibility/supportibility, the customer won't get as much control as they may want, but it will greatly reduce the testing/support burden on the Openshift team. + +## Design Details + +### Open Questions [optional] + +* Should we consider an additional profile that is between the proposed 'Default' and 'Slower' profiles? + +### Test Plan + +**Note:** *Section not required until targeted at a release.* + +Given that the profiles map to existing values, it should be possible to update existing tests to run each profile to ensure compatibility and stability. + +### Graduation Criteria + +**Note:** *Section not required until targeted at a release.* + +None + +#### Dev Preview -> Tech Preview + +N/A + +#### Tech Preview -> GA + +- Customer feedback +- API extension +- Profile change with minimal disruption +- Profile testing on all platforms +- Performance testing for each profile +- End user documentation + - Description of different profiles and their effects + - Steps to change profiles + +#### Removing a deprecated feature + +N/A + +### Upgrade / Downgrade Strategy + +N/A + +### Version Skew Strategy + +N/A + +### Operational Aspects of API Extensions + +N/A + +#### Failure Modes + +N/A + +#### Support Procedures + +- If the user attempts to set the profile to an invalid value (not one of the predefined profile names), the API will not accept the value and return an error. + +## Implementation History + +None + +## Alternatives + +### Arbitrary parameter values +Instead of restricting the customer to predefined profiles, this would allow them direct access to the parameters and allow them to set arbitrary values (within some bounds). + +This would allow for more flexibility, but would also likely increase the number of support cases as it's very likely they will set values that are too slow, too fast, or exercise an edge case that cause more diruptions. +Another downside is that with the profiles, there is a discrete, small, number of permutations to test, giving more confidence of the exact effects a given profile has on performance/latency; allow arbitrary values, by definition, greatly increases the testing permutations, making it very difficult to catch bad values before they're allowed in production. + +## Infrastructure Needed [optional] + +None