Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding in Tolerations to the Galasa TestPodScheduler #45

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Tom-Slattery
Copy link
Contributor

The Galasa test pod scheduler can handle Node Affinities, for our usage in CICS we also need it to be able to handle Node Tolerations, so that we can make a node exclusively run our Galasa test pods.

I have added a property to the Settings.java file for Node Tolerations based on how the Node Affinity property is handled. This should allow you to specify multiple node tolerations in your Ecosystem Config as a list:

  galasa_node_tolerations: "node-label=Operator:Condition,node-label2=Operator2:Condition2..."

for example:

  galasa_node_tolerations: "galasa-engines=Exists:NoSchedule"

I have added code to the TestPodScheduler that will process the text as a list and add them as tolerations items in the podspec before creating the pod.

@Tom-Slattery Tom-Slattery force-pushed the nodetolerations branch 3 times, most recently from d056189 to f0ff0a6 Compare November 20, 2024 17:11
@Tom-Slattery
Copy link
Contributor Author

I have installed a new Galasa Ecosystem at the latest "main" version. I've edited the Galasa boot image to include my changes to the k8s controller JAR. I can confirm that the ecosystem consistantly schedules the test runs on the correct node with this Tolerations change, which would not otherwise be possible without the node toleration.

@@ -261,6 +262,40 @@ V1Pod createTestPod(String runName, String engineName, boolean isTraceEnabled) {
}
}

String nodeTolerations = this.settings.getNodeTolerations();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this pls be put into a separate method, as it's getting pretty big...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, any unit tests for that method would be good ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, moved the nodetoleration creation into it's own method. I've also added a unit test for the nodetolerations creation. I spotted that the nodeaffinity creation didn't have a unit test already, so I've added one for that too.

@techcobweb
Copy link
Contributor

Also, some documentation on how an ecosystem admin is supposed to use this new feature would be needed. Where ? Not sure. Even if it's in a separate readme.md somewhere in the helm chart perhaps ?

Signed-off-by: Tom Slattery <[email protected]>
@Tom-Slattery
Copy link
Contributor Author

Tom-Slattery commented Nov 21, 2024

Also, some documentation on how an ecosystem admin is supposed to use this new feature would be needed. Where ? Not sure. Even if it's in a separate readme.md somewhere in the helm chart perhaps ?

Here is some documentation about the expected usage of this feature for an ecosystem admin:

As an ecosystem admin, you may need to configure where which hardware nodes your Galasa test pods are run on. As an example, you may wish to dedicate certain k8s nodes to exlusively to running Galasa tests to avoid resource conflicts with other workloads on the cluster. This ensures hardware isolation and gives a better guarantee of the performance of your Galasa Ecosystem.

Two components are required to achieve this:
- A k8s node affinity allows you to designate a preffered node for your Galasa tests to be run on
- A k8s node taint prevents other workloads from scheduling on that node. To allow you Galasa test pods to be scheduled on the tainted node, you must define a node tolerance

These two settings can be configured via the Galasa Ecosystem configmap, for example:

galasa_node_preferred_affinity: galasa-engines

galasa_node_tolerations: galasa-engines=Exists:NoSchedule

The node affinity defines a label that can be used to identify nodes that your Galasa Ecosystem should schedule pods on.

The Galasa Node Tolerations param defines a comma-separated list of toleration conditions of the format "nodelabel=Operator:Condition". In the above example, nodes with the label "galasa-engines" that have a "NoSchedule" taint should be tolerated, allowing the Galasa test run pods to be scheduled on that node.

Where do we want this to be included? Is there somewhere that documents the usage of the other parameters of the Galasa Ecosystem configmap?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants