Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: increase timeout for k8s intg tests #9929

Merged
merged 1 commit into from
Sep 13, 2024
Merged

ci: increase timeout for k8s intg tests #9929

merged 1 commit into from
Sep 13, 2024

Conversation

rb-determined-ai
Copy link
Contributor

@rb-determined-ai rb-determined-ai commented Sep 12, 2024

The latest versions of k8s have a new Job condition called
JobFailureTarget which is a signal to the jobs controller to kill off
the pods of the job.

Our jobUpdatedCallback() function waits for the JobFailure condition,
which now comes after the pod is fully terminated, which takes a lot
longer.

Probably we need to make sure our k8s logic is still valid, and deal
with the additional time it takes a job to reach JobFailed, if it
affects anything other than this test.

Until then, let's unblock CI for the whole team by just increasing the
test time for TestExternalPodDelete and TestNodeWorkflows.

Copy link

netlify bot commented Sep 12, 2024

Deploy Preview for determined-ui ready!

Name Link
🔨 Latest commit 282d891
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/66e4ab16f48f9d0008ba989b
😎 Deploy Preview https://deploy-preview-9929--determined-ui.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.52%. Comparing base (867eb31) to head (282d891).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9929      +/-   ##
==========================================
- Coverage   59.18%   54.52%   -4.67%     
==========================================
  Files         751     1252     +501     
  Lines      104462   156550   +52088     
  Branches     3598     3599       +1     
==========================================
+ Hits        61824    85354   +23530     
- Misses      42506    71064   +28558     
  Partials      132      132              
Flag Coverage Δ
backend 45.12% <ø> (+1.32%) ⬆️
harness 72.75% <ø> (ø)
web 54.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 501 files with indirect coverage changes

Copy link
Contributor

@amandavialva01 amandavialva01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
We should probably ticket this though,so we can devise a more permanent solution to the k8s API changes.

@rb-determined-ai rb-determined-ai changed the title ci: increase timeout for TestExternalPodDelete ci: increase timeout for k8s intg tests Sep 12, 2024
Copy link
Contributor

@carolinaecalderon carolinaecalderon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The latest versions of k8s have a new Job condition called
JobFailureTarget which is a signal to the jobs controller to kill off
the pods of the job.

Our jobUpdatedCallback() function waits for the JobFailure condition,
which now comes after the pod is fully terminated, which takes a lot
longer.

Probably we need to make sure our k8s logic is still valid, and deal
with the additional time it takes a job to reach JobFailed, if it
affects anything other than this test.

Until then, let's unblock CI for the whole team by just increasing the
test time for TestExternalPodDelete and TestNodeWorkflows.
@rb-determined-ai rb-determined-ai merged commit 13b7b3f into main Sep 13, 2024
82 of 94 checks passed
@rb-determined-ai rb-determined-ai deleted the rb/new-k8s branch September 13, 2024 22:07
salonig23 pushed a commit that referenced this pull request Sep 17, 2024
The latest versions of k8s have a new Job condition called
JobFailureTarget which is a signal to the jobs controller to kill off
the pods of the job.

Our jobUpdatedCallback() function waits for the JobFailure condition,
which now comes after the pod is fully terminated, which takes a lot
longer.

Probably we need to make sure our k8s logic is still valid, and deal
with the additional time it takes a job to reach JobFailed, if it
affects anything other than this test.

Until then, let's unblock CI for the whole team by just increasing the
test time for TestExternalPodDelete and TestNodeWorkflows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants