-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix check for nvidia-device-plugin-daemonset when deploying NVIDIA operator stack #1871
Merged
bdattoma
merged 8 commits into
red-hat-data-services:master
from
bdattoma:fix_gpupod_check
Oct 2, 2024
Merged
Fix check for nvidia-device-plugin-daemonset when deploying NVIDIA operator stack #1871
bdattoma
merged 8 commits into
red-hat-data-services:master
from
bdattoma:fix_gpupod_check
Oct 2, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Robot Results
|
bdattoma
added
needs testing
Needs to be tested in Jenkins
enhancements
Bugfixes, enhancements, refactoring, ... in tests or libraries (PR will be listed in release-notes)
do not merge
Do not merge this yet please
verified
This PR has been tested with Jenkins
and removed
needs testing
Needs to be tested in Jenkins
do not merge
Do not merge this yet please
labels
Sep 30, 2024
bdattoma
force-pushed
the
fix_gpupod_check
branch
from
October 1, 2024 11:42
198fb4d
to
dcf8575
Compare
kobihk
reviewed
Oct 1, 2024
Co-authored-by: Kobi Hakimi <[email protected]>
kobihk
approved these changes
Oct 1, 2024
Quality Gate passedIssues Measures |
tarukumar
approved these changes
Oct 2, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancements
Bugfixes, enhancements, refactoring, ... in tests or libraries (PR will be listed in release-notes)
verified
This PR has been tested with Jenkins
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As of now, when deploying the NVIDIA GPU operator stack, the nvidia-device-plugin-daemonset gets restarted after being in "init" status. Our script has already started waiting for the pod to be in Ready status but it waits for a pod which won't never be up and running. In addition, it has a 20 minutes timeout which makes CI spending more time than needed
Solution:
PR validation:
rhods-ci-pr-test/3368
PASS - the job failure at the end is not related to this PR. It took 15 minutes from provisioning to stack deploymentrhods-ci-pr-test/3371
- approx 15 min for e2e flow (provisioning + operator installtion)rhods-ci-pr-test/3372
PASS - it took 28 minutes for e2e flow (provisioning + operator installtion) - the job failure at the end is not related to this PR