-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New KW Wait For Deployment Replica To Be Ready
for oc_install.robot
#1867
New KW Wait For Deployment Replica To Be Ready
for oc_install.robot
#1867
Conversation
Robot Results
|
Output example - See line:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether having two subsequent conditions (first hardcoded pod number, then check for the value in replicas ready) here won't make this a bit unnecessarily complex 🤔
Personally, I would go only with one approach - either having this still hardcoded or checking the expected replicas are ready (I prefer this one, though, I'm not sure whether it has some drawbacks). I think it doesn't make much sense to have both checks here since we pass if any of them pass anyway.
I agree with @jstourac . Having 2 conditions adds additional complexity. I would go with the new approach checking the number of pods in the replicaset and removing the hardcoded part |
I did not want to change backward compatibility, since the KW name is |
Yeah, I saw that log. Still, I think that this may be confusing so if we are changing this, we should do it the proper way 🙂 We can have separate keywords for this kind of check and use the one that is preferable for different places. Anyway, this is mergeable, just I don't like this as a final solution. So, we can merge it in case there is a followup PR to narrow this down. Or we can wait till this is updated. Or, if I'm the only one, I will simply be sad and this becomes the final solution, that is also an option 😀 |
@jstourac there are 13 calls to For example, instead of calling:
Call: However, we will then lose the warning that the number of replica-set is not as expected (a warning that is seen with my original PR). |
ods_ci/tests/Resources/OCP.resource
Outdated
[Arguments] ${namespace} ${label_selector} ${timeout}=600s | ||
Log To Console Waiting for Namespace ${namespace} Deployment with label "${label_selector}" to have desired Replica-Set | ||
${output} = Wait Until Keyword Succeeds ${timeout} 3s Run And Verify Command | ||
... oc get deployment -l ${label_selector} -n ${namespace} -o json | jq -e '.status | .replicas \=\= .readyReplicas' |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Wait For Pods Numbers
to first wait for all pods readinessWait For Deployment Replica To Be Ready
for oc_install.robot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @manosnoam ! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Noam!
b77e8a2
to
78b3d0d
Compare
Also verify if the number of running pods is as desired by replica-set, instead of directly failing if the hard-coded pod number is not up to date. For example, 3 pods are currently expected for odh-model-controller, but in latest ODH build the replica-set desired number is 1. Signed-off-by: manosnoam <[email protected]>
This Keyword waits for all pods with a specified label in a specified deployment, to have the Replica-Set ready (desired pods running) Signed-off-by: manosnoam <[email protected]>
78b3d0d
to
dca338b
Compare
Quality Gate passedIssues Measures |
Wait For Deployment Replica To Be Ready | ||
[Documentation] Wait for Deployment of ${label_selector} in ${namespace} to have the Replica-Set Ready | ||
[Arguments] ${namespace} ${label_selector} ${timeout}=600s | ||
Log To Console Waiting for Deployment with label "${label_selector}" in Namespace "${namespace}", to have desired Replica-Set |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
[Documentation] Wait for Deployment of ${label_selector} in ${namespace} to have the Replica-Set Ready | ||
[Arguments] ${namespace} ${label_selector} ${timeout}=600s | ||
Log To Console Waiting for Deployment with label "${label_selector}" in Namespace "${namespace}", to have desired Replica-Set | ||
${output} = Wait Until Keyword Succeeds ${timeout} 3s Run And Verify Command |
Check warning
Code scanning / Robocop
The assignment sign is not consistent within the file. Expected '{{ expected_sign }}' but got '{{ actual_sign }}' instead Warning test
[Documentation] Wait for Deployment of ${label_selector} in ${namespace} to have the Replica-Set Ready | ||
[Arguments] ${namespace} ${label_selector} ${timeout}=600s | ||
Log To Console Waiting for Deployment with label "${label_selector}" in Namespace "${namespace}", to have desired Replica-Set | ||
${output} = Wait Until Keyword Succeeds ${timeout} 3s Run And Verify Command |
Check notice
Code scanning / Robocop
Variable '{{ name }}' is assigned but not used Note test
Are we moving away from validating the number of replicas for each component for a specific reason. The original code was designed to ensure that the default number of pods for each component is present. If the goal of this PR is simply to ensure that component resources are ready, I don't think that's a good approach. This change seems to bypass the critical testing of the actual default pod validation. Without testing to confirm that the default number of pods exists, I believe this PR should be put on hold. So in simple term if you want this pr to be merged first create the tc to actual validate the number of default pod |
Yes and no - now instead of depending on the hardcoded expected value, we are checking that the number of the replicas matches what is expected for each deployment. So at least we check that what is set in the deployment is matched.
Yes, if we want to check that the default number of replicas for each deployment is at some number, we should have a true test for this and not in this code, which should handle just and only the installation of the operator and not blocking further steps unless something very serious happens. |
The issue isn’t whether we should check—IMO we should definitely check that. The real question is that we previously had validation of the number of pods created by the default but now, with the recent changes, that validation is absent. It feels like we’re operating without a necessary test until we have an actual test in place. So, you’re asking if we should proceed without it. We could, since we know there are no significant changes in version 2.14, but that wouldn’t be an ideal approach. |
Exactly, and to do so we will need an additional test that gets the defined pod numbers from ODH configurations/docs, not from the ODS-CI test code itself. Until we implement such a test case, we should not block ODH deployment, but only verify the replica-set. |
|
Thanks Tarun for your suggestion! FYI, we currently check the number of pods in dedicated tests in, so the deployment task might not need to verify pods numbers:
|
Yes, but theree has been changed made for odh nightly which reduced the pod number for model controller to 1 l. Which we should have be3n informed so that we can make change to our repo accordingly which I think was missing hree beacuse of that model controller pod was failing so for odh it is 1 of rhaoi it is 3 |
…red-hat-data-services#1867) * New KW `Wait For Deployment Replica To Be Ready` for oc_install.robot This Keyword waits for all Deployment pods with a specified label, to have its Replica-Set ready (desired pods running). Signed-off-by: manosnoam <[email protected]>
* Enhance keyword "Run And Verify Command" returning stdout Signed-off-by: Jorge Garcia Oncins <[email protected]> * Disable cache in version-test and take-nap pipeline samples Signed-off-by: Jorge Garcia Oncins <[email protected]> * Use kw "Run And Verify Command" in DataSciencePipelinesBackend Signed-off-by: Jorge Garcia Oncins <[email protected]> * Fix PIP_TRUSTED_HOST in GPU testing sample Signed-off-by: Jorge Garcia Oncins <[email protected]> * Revert "Disable cache in version-test and take-nap pipeline samples" This reverts commit 336790bc5482dc708e9d0824f879e3de14a680ae. * Add initial DSP upgrade testing tests Signed-off-by: Jorge Garcia Oncins <[email protected]> * Fix linter errors Signed-off-by: Jorge Garcia Oncins <[email protected]> * Rename keyword and fix typo Signed-off-by: Jorge Garcia Oncins <[email protected]> * New KW `Wait For Deployment Replica To Be Ready` for oc_install.robot (#1867) * New KW `Wait For Deployment Replica To Be Ready` for oc_install.robot This Keyword waits for all Deployment pods with a specified label, to have its Replica-Set ready (desired pods running). Signed-off-by: manosnoam <[email protected]> --------- Signed-off-by: manosnoam <[email protected]> Co-authored-by: Noam Manos <[email protected]>
This is to verify if the number of running pods is as desired by replica-set, instead of directly failing if the pods number is not as defined in ODS-CI (hard-coded).
For example, 3 pods are currently expected for odh-model-controller in ODS-CI, but in the latest ODH build the replica-set desired number is 1, not 3. This should fix this false failure.
Here is the tracking for the change of the number of replicas for the odh-model-controller: