-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(langchain): disable flaky tests #11511
base: main
Are you sure you want to change the base?
Conversation
|
BenchmarksBenchmark execution time: 2024-11-27 22:08:43 Comparing candidate commit 9e13565 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 388 metrics, 2 unstable metrics. |
There appear to be stability issues with using snapshots and/or LangChain in general. There are failures in the mocked tests that look like: ``` builtins.AssertionError: assert 0 == 1 + where 0 = <MagicMock name='LLMObsSpanWriter().enqueue' id='127482146130048'>.call_count + where <MagicMock name='LLMObsSpanWriter().enqueue' id='127482146130048'> = <MagicMock name='LLMObsSpanWriter()' id='127482147073440'>.enqueue ``` as well as failures with snapshot based tests: ``` builtins.Failed: At request <Request GET /test/session/snapshot >: At snapshot (token='tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple'): - Directory: /go/src/github.com/DataDog/apm-reliability/dd-trace-py/tests/snapshots - CI mode: 1 - Trace File: /go/src/github.com/DataDog/apm-reliability/dd-trace-py/tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple.json - Stats File: /go/src/github.com/DataDog/apm-reliability/dd-trace-py/tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple_tracestats.json At compare of 1 expected trace(s) to 0 received trace(s): Did not receive expected traces: 'langchain.request' ``` While we investigate a more stable method of testing it makes sense to disable the tests to avoid noise to our neighbours in the library :). DOWN WITH FLAKY TESTS
fbee49d
to
c07e703
Compare
we can always reintroduce, commenting out is just messy with imports and such
c07e703
to
92ee3b7
Compare
Datadog ReportBranch report: ✅ 0 Failed, 55 Passed, 1413 Skipped, 1m 18.71s Total duration (34m 44.55s time saved) |
There appear to be stability issues with using snapshots and/or LangChain in general.
There are failures in the mocked tests that look like:
as well as failures with snapshot based tests:
While we investigate a more stable method of testing it makes sense to disable the tests to avoid noise to our neighbours in the library :).
DOWN WITH FLAKY TESTS
Checklist
Reviewer Checklist