Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI test linux://python/ray/dag:test_torch_tensor_dag_gpu is flaky #45920

Closed
can-anyscale opened this issue Jun 13, 2024 · 69 comments · Fixed by #48204
Closed

CI test linux://python/ray/dag:test_torch_tensor_dag_gpu is flaky #45920

can-anyscale opened this issue Jun 13, 2024 · 69 comments · Fixed by #48204
Assignees
Labels
bug Something that is supposed to be working; but isn't ci-test core Issues that should be addressed in Ray Core stability triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@can-anyscale
Copy link
Collaborator

CI test linux://python/ray/dag:test_torch_tensor_dag_gpu is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/4886#01901018-8a42-41d5-9187-e980a86df6a9
- https://buildkite.com/ray-project/postmerge/builds/4886#01901003-09d9-4c74-bfa0-32c5f0e33dea

DataCaseName-linux://python/ray/dag:test_torch_tensor_dag_gpu-END
Managed by OSS Test Policy

@can-anyscale can-anyscale added bug Something that is supposed to be working; but isn't ci-test core Issues that should be addressed in Ray Core flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ ray-test-bot Issues managed by OSS test policy stability triage Needs triage (eg: priority, bug/not-bug, and owning component) weekly-release-blocker Issues that will be blocking Ray weekly releases labels Jun 13, 2024
@can-anyscale
Copy link
Collaborator Author

Blamed commit: d577652 found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1228

@can-anyscale
Copy link
Collaborator Author

blame reverted

@can-anyscale
Copy link
Collaborator Author

This test is now considered as flaky because it has been failing on postmerge for too long. Flaky tests do not run on premerge.

1 similar comment
@can-anyscale
Copy link
Collaborator Author

This test is now considered as flaky because it has been failing on postmerge for too long. Flaky tests do not run on premerge.

@can-anyscale
Copy link
Collaborator Author

can-anyscale commented Jun 15, 2024

still failing; but likely not release blocking, let's confirm on Monday

@kevin85421
Copy link
Member

The error seems to be related to dependency conflicts.

Screenshot 2024-06-15 at 10 17 06 AM

@can-anyscale
Copy link
Collaborator Author

@kevin85421 given that dag is still in experimental state, i'm assuming this is not a release blocker, can you or @stephanie-wang confirm? thanks

@kevin85421
Copy link
Member

@can-anyscale Where is the error message? I am not sure whether the issue is actually related to the ADAG, as the CI error indicates a dependency conflict.

can you or @stephanie-wang confirm?

Wait for Stephanie to confirm.

@can-anyscale
Copy link
Collaborator Author

@kevin85421
Copy link
Member

The master branch seems to have several ongoing revert PRs. Let's check whether it will still happen after #46066 is merged. cc @ruisearch42

@ruisearch42
Copy link
Contributor

ruisearch42 commented Jun 15, 2024 via email

@can-anyscale
Copy link
Collaborator Author

this test is actually passing https://buildkite.com/ray-project/postmerge/builds/4965#01901d46-444f-431c-a29c-225dc3291f40, but i'll wait for automation to close it

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

CI test linux://python/ray/dag:test_torch_tensor_dag_gpu is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/6172#0191cb01-301c-4586-9acd-72aedb17b55d
- https://buildkite.com/ray-project/postmerge/builds/6172#0191cae0-e5e6-4705-abf8-08e9a7995437
- https://buildkite.com/ray-project/postmerge/builds/6171#0191c9d0-2e51-4144-99c9-9e0fd2cc4e10

DataCaseName-linux://python/ray/dag:test_torch_tensor_dag_gpu-END
Managed by OSS Test Policy

1 similar comment
@can-anyscale
Copy link
Collaborator Author

CI test linux://python/ray/dag:test_torch_tensor_dag_gpu is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/6172#0191cb01-301c-4586-9acd-72aedb17b55d
- https://buildkite.com/ray-project/postmerge/builds/6172#0191cae0-e5e6-4705-abf8-08e9a7995437
- https://buildkite.com/ray-project/postmerge/builds/6171#0191c9d0-2e51-4144-99c9-9e0fd2cc4e10

DataCaseName-linux://python/ray/dag:test_torch_tensor_dag_gpu-END
Managed by OSS Test Policy

@can-anyscale can-anyscale reopened this Oct 25, 2024
@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@rkooo567 rkooo567 removed flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ weekly-release-blocker Issues that will be blocking Ray weekly releases labels Oct 31, 2024
@rkooo567
Copy link
Contributor

@can-anyscale idk why it keeps saying this, but I Removed some suspcious labels!

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@rkooo567 rkooo567 removed the ray-test-bot Issues managed by OSS test policy label Nov 22, 2024
@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

@can-anyscale
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't ci-test core Issues that should be addressed in Ray Core stability triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants