-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Provisional proposal to add API options to control where build pods are scheduled. These will rely on core Kubernetes features related to pod scheduling. This proposal was partially motivated by the multi-arch feature discussions, where it was revealed that we currently have no means of controlling where build pods are scheduled. While these features may support a future proof of concept for multi-arch builds, orchestrating multi-arch builds end to end is out of scope.
- Loading branch information
1 parent
cd27374
commit 483b3aa
Showing
1 changed file
with
214 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
<!-- | ||
Copyright The Shipwright Contributors | ||
SPDX-License-Identifier: Apache-2.0 | ||
--> | ||
|
||
--- | ||
title: build-scheduler-options | ||
authors: | ||
- "@adambkaplan" | ||
reviewers: | ||
- "@apoorvajagtap" | ||
- "@HeavyWombat" | ||
approvers: | ||
- "@qu1queee" | ||
- "@SaschaSchwarze0" | ||
creation-date: 2024-05-15 | ||
last-updated: 2024-05-15 | ||
status: provisional | ||
see-also: [] | ||
replaces: [] | ||
superseded-by: [] | ||
--- | ||
|
||
# Build Scheduler Options | ||
|
||
<!--> | ||
|
||
This is the title of the enhancement. Keep it simple and descriptive. A good title can help | ||
communicate what the enhancement is and should be considered as part of any review. | ||
|
||
The YAML `title` should be lowercased and spaces/punctuation should be replaced with `-`. | ||
|
||
To get started with this template: | ||
|
||
1. **Make a copy of this template.** Copy this template into the main | ||
`proposals` directory, with a filename like `NNNN-neat-enhancement-idea.md` | ||
where `NNNN` is an incrementing number associated with this SHIP. | ||
2. **Fill out the "overview" sections.** This includes the Summary and Motivation sections. These | ||
should be easy and explain why the community should desire this enhancement. | ||
3. **Create a PR.** Assign it to folks with expertise in that domain to help | ||
sponsor the process. The PR title should be like "SHIP-NNNN: Neat | ||
Enhancement Idea", where "NNNN" is the number associated with this SHIP. | ||
4. **Merge at each milestone.** Merge when the design is able to transition to a new status | ||
(provisional, implementable, implemented, etc.). View anything marked as `provisional` as an idea | ||
worth exploring in the future, but not accepted as ready to execute. Anything marked as | ||
`implementable` should clearly communicate how an enhancement is coded up and delivered. Aim for | ||
single topic PRs to keep discussions focused. If you disagree with what is already in a document, | ||
open a new PR with suggested changes. | ||
|
||
The `Metadata` section above is intended to support the creation of tooling around the enhancement | ||
process. | ||
|
||
<--> | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [docs](/docs/) | ||
|
||
## Open Questions [optional] | ||
|
||
TBD | ||
|
||
## Summary | ||
|
||
Add API options that influece where `BuildRun` pods are scheduled on Kubernetes. This can be | ||
acomplished through the following mechanisms: | ||
|
||
- [Node Selectors](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) | ||
- [Affinity/anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity) | ||
- [Taints and Tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) | ||
|
||
## Motivation | ||
|
||
Today, `BuildRun` pods will run on arbitrary nodes - developers, platform engineers, and admins do | ||
not have the ability to control where a specific build pod will be scheduled. Teams may have | ||
several motivations for controlling where a build pod is scheduled: | ||
|
||
- Builds can be CPU/memory/storage intensive. Scheduling on larger worker nodes with additional | ||
memory or compute can help ensure builds succeed. | ||
- Clusters may have mutiple worker node architectures and even OS (Windows nodes). Container images | ||
are by their nature specific to the OS and CPU architecture, and default to the host operating | ||
system and architecture. Builds may need to specify OS and architecture through node selectors. | ||
- Left unchecked, builds may congregate on a set of nodes, impacting overall cluster utilization | ||
and stability. | ||
|
||
### Goals | ||
|
||
- Allow build pods to run on specific nodes using node selectors. | ||
- Allow build pods to set node affinity/anti-affinity rules. | ||
- Allow build pods to tolerate node taints. | ||
- Allow node selection, pod affinity, and taint toleration to be set at the cluster level. | ||
|
||
### Non-Goals | ||
|
||
- Primary feature support for multi-arch builds. | ||
|
||
## Proposal | ||
|
||
This is where we get down to the nitty gritty of what the proposal actually is. | ||
|
||
### User Stories [optional] | ||
|
||
#### Node Selection - platform engineer | ||
|
||
As a platform engineer, I want builds to use node selectors to ensure they are scheduled on nodes | ||
optimized for builds so that builds are more likely to succeed | ||
|
||
#### Node Selection - arch-specific images | ||
|
||
As a developer, I want to select the OS and architecture of my build's node so that I can run | ||
builds on worker nodes with multiple architectures. | ||
|
||
#### Pod affinity - platform engineer/admin | ||
|
||
As a platform engineer/cluster admin, I want to set anti-affinity rules for build pods so that | ||
running builds are not scheduled/clustered on the same node. | ||
|
||
#### Taint toleration - cluster admin | ||
|
||
As a cluster admin, I want builds to be able to tolerate provided node taints so that they can | ||
be scheduled on nodes that are not suitable/designated for application workloads. | ||
|
||
### Implementation Notes | ||
|
||
TBD | ||
|
||
<!--> | ||
**Note:** *Section not required until feature is ready to be marked 'implementable'.* | ||
|
||
Describe in detail what you propose to change. Be specific as to how you intend to implement this | ||
feature. If you plan to introduce a new API field, provide examples of how the new API will fit in | ||
the broader context and how end users could invoke the new behavior. | ||
<--> | ||
|
||
### Test Plan | ||
|
||
TBD | ||
|
||
<!--> | ||
**Note:** *Section not required until targeted at a release.* | ||
|
||
Consider the following in developing a test plan for this enhancement: | ||
|
||
- Will there be e2e and integration tests, in addition to unit tests? | ||
- How will it be tested in isolation vs with other components? | ||
|
||
No need to outline all of the test cases, just the general strategy. Anything that would count as | ||
tricky in the implementation and anything particularly challenging to test should be called out. | ||
|
||
All code is expected to have adequate tests (eventually with coverage expectations). | ||
<--> | ||
|
||
### Release Criteria | ||
|
||
TBD | ||
|
||
**Note:** *Section not required until targeted at a release.* | ||
|
||
#### Removing a deprecated feature [if necessary] | ||
|
||
Not applicable. | ||
|
||
#### Upgrade Strategy [if necessary] | ||
|
||
<!--> | ||
|
||
If applicable, how will the component be upgraded? Make sure this is in the test | ||
plan. | ||
|
||
Consider the following in developing an upgrade strategy for this enhancement: | ||
|
||
- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to | ||
make on upgrade in order to keep previous behavior? | ||
- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to | ||
make on upgrade in order to make use of the enhancement? | ||
<--> | ||
|
||
### Risks and Mitigations | ||
|
||
TBD | ||
|
||
<!--> | ||
What are the risks of this proposal and how do we mitigate? Think broadly. For example, consider | ||
both security and how this will impact the larger Shipwright ecosystem. | ||
|
||
How will security be reviewed and by whom? How will UX be reviewed and by whom? | ||
<--> | ||
|
||
## Drawbacks | ||
|
||
TBD - The idea is to find the best form of an argument why this enhancement should _not_ be implemented. | ||
|
||
## Alternatives | ||
|
||
TBD | ||
|
||
Similar to the `Drawbacks` section the `Alternatives` section is used to highlight and record other | ||
possible approaches to delivering the value proposed by an enhancement. | ||
|
||
## Infrastructure Needed [optional] | ||
|
||
No additional infrastructure antipated. | ||
Test KinD clusters may need to deploy with additional nodes where these features can be verified. | ||
|
||
## Implementation History | ||
|
||
Major milestones in the life cycle of a proposal should be tracked in `Implementation History`. | ||
|
||
|