Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Discovery fails on large codebases #21922

Closed
juandiegopalomino opened this issue Sep 5, 2023 · 10 comments
Closed

Test Discovery fails on large codebases #21922

juandiegopalomino opened this issue Sep 5, 2023 · 10 comments
Assignees
Labels
area-testing info-needed Issue requires more information from poster triage-needed Needs assignment to the proper sub-team

Comments

@juandiegopalomino
Copy link

juandiegopalomino commented Sep 5, 2023

Type: Bug

Behaviour

NOTE: not sure if it's a bug report or feature request to solve a bug
Hi! I am JD from the developer experience team at Rippling, and we are exploring moving the company's official editor from Pycharm to VSCode, but we are running on some critical issues when trying to use the test runner/debugger feature.

For at least the pytest runs, test discovery currently functions by running pytest with the --collect-only flag to gather data about all current tests. For reasonable codebases, this works quite well, reasonably quick, and is a clever way of not re-inventing the wheel with test discovery.

For large codebases with ~60K tests and ~300 dependencies, this becomes a problem. In our ci with larger machines, pytest --collect-only takes a little over 3 minutes to run, and ~10 minutes in the developer VMs we tested this out on. During this time, the editor's child process also consume updwards of 16GB of RAM. Moreover, afterwards the test line placement was wrong and the editor begins to show a myriad of ui errors, to the point of just restarting it from the frustration.
image

It is not a big leap of imagination to consider that the issue comes from the fact that we're trying to collect an insane number of tests in one go. We understand why this default strategy was chosen, but are hard-pressed to find alternatives.

One solution which we've poc'd following this guide was to not rely on pytest for the test discovery, but rather do a primitive syntax scan on the currently open python files and dynamically identify the tests following the simple innate rules:

  1. Does the file fit the regex “test_*.py”?
  2. Does the class start with “Test”?
  3. Does the function start with “test”?

We found that this discovery strategy is quite snappy, takes up pretty much no resources, and works for almost all test cases (unless you're doing something really advanced with pytest/conftest).

Our ask is:

  1. Is there such a feature already available that we missed?
  2. If there isn't, is there something on the short-term road map for this?
  3. If there isn't, is can you help us out in making a PR to add this as an optional discovery strategy? (like point us to what part of the code needs to be updated for this to function-- I think it is pythonFiles/tests/testing_tools/adapter/pytest/test_discovery.py or pythonFiles/tests/pytestadapter/test_discovery.py or src/client/testing/testController/pytest/pytestDiscoveryAdapter.ts but I am not sure)

Expected vs. Actual

For the currently open test files, Pytest discovery

  1. Works quickly (10 seconds)
  2. Works correctly
  3. Does not consume ~16GB of ram

Actual: freezes for 10 minutes and then breaks

Steps to reproduce:

  1. Get a codebase with ~60K tests and ~300 dependencies.
  2. Setup the venv everything usual there.
  3. Do the regular flow to configure test runner with pytest.
  4. Watch as your editor stalls for 10 minutes and then enters a broken state.

Diagnostic data

  • Python version (& distribution if applicable, e.g. Anaconda): 3.10.11
  • Type of virtual environment used (e.g. conda, venv, virtualenv, etc.): Poetry
  • Value of the python.languageServer setting: Default
Output for Python in the Output panel (ViewOutput, change the drop-down the upper-right of the Output panel to Python)

XXX

User Settings


languageServer: "Pylance"

testing
• pytestArgs: "<placeholder>"
• pytestEnabled: true

Extension version: 2023.14.0
VS Code version: Code 1.81.1 (Universal) (6c3e3dba23e8fadc360aed75ce363ba185c49794, 2023-08-09T22:20:33.924Z)
OS version: Darwin arm64 22.6.0
Modes:

System Info
Item Value
CPUs Apple M1 Pro (10 x 24)
GPU Status 2d_canvas: enabled
canvas_oop_rasterization: disabled_off
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
metal: disabled_off
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
video_decode: enabled
video_encode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled
webgpu: enabled
Load (avg) 5, 5, 5
Memory (System) 32.00GB (0.32GB free)
Process Argv . --crash-reporter-id 93d076a8-2779-4634-a060-d784b251ef33
Screen Reader no
VM 0%
A/B Experiments
vsliv368cf:30146710
vsreu685:30147344
python383:30185418
vspor879:30202332
vspor708:30202333
vspor363:30204092
vswsl492cf:30256860
vslsvsres303:30308271
vserr242:30382549
pythontb:30283811
vsjup518:30340749
pythonptprofiler:30281270
vsdfh931:30280409
vshan820:30294714
vstes263cf:30335440
vscoreces:30445986
vscod805cf:30301675
binariesv615:30325510
bridge0708:30335490
bridge0723:30353136
vsaa593:30376534
pythonvs932:30410667
vsclangdf:30486550
c4g48928:30535728
dsvsc012:30540252
pynewext54:30695312
azure-dev_surveyone:30548225
vscccc:30803845
2e4cg342:30602488
f6dab269:30613381
a9j8j154:30646983
showlangstatbar:30737416
03d35959:30757346
pythonfmttext:30731395
fixshowwlkth:30771522
showindicator:30805244
pythongtdpath:30769146
i26e3531:30792625
pythonnosmt12:30797651
pythonidxpt:30805730
pythonnoceb:30805159
dsvsc013:30795093
dsvsc014:30804076
diffeditorv2:30821572

@github-actions github-actions bot added the triage-needed Needs assignment to the proper sub-team label Sep 5, 2023
@eleanorjboyd
Copy link
Member

Hi! Thank you for your detailed report. We are excited to hear your company might move to vscode and would love to make that happen!

One first step which would be helpful is if you could check if you are on the new testing rewrite. You can check for it by looking to see if Experiment 'pythonTestAdapter' is listed as active in your python logs. If not can you please turn it on and let me know if the behavior changes? You can do so by adding this setting to your users settings.json "python.experiments.optInto": ["pythonTestAdapter"]. If you are unable to see the log for the experiment change your log level to info or trace and it should show up.

I am curious if this helps with efficiency as we have switched to to invoking pytest directly. The code for this rewrite can be found in the pytestDiscoveryAdapter.ts if you want to take a look as well.

Thanks!

@github-actions github-actions bot added the info-needed Issue requires more information from poster label Sep 7, 2023
@luabud luabud added the ghc-osd label Sep 19, 2023
@juandiegopalomino
Copy link
Author

juandiegopalomino commented Sep 28, 2023

@eleanorjboyd
Sorry for the delay, I was pulled into other projects. I have now done as follows:

I have attempted to do so in my local user settings here
Screenshot 2023-09-27 at 4 33 56 PM

As well as in my project settings here (which warned me to use the user settings):
Screenshot 2023-09-27 at 11 55 24 PM

Neither of these made the test debugging any better:

Screenshot 2023-09-27 at 4 33 18 PM

I am still facing the same errors as before. You are still running pytest collect in your rewrite so I am not surprised that things are not better.

I would like to be clear: this is a blocker for us. If we can't get test discovery and running to work, then we (the devx team) can never recommend VS code to the engineers and support it as a first class citizen. For me there are 2 possible ways to proceed:

  1. My team can work on developing an update to the pytestDiscovery adapter to optionally support syntax-based text discovery on open files.
  2. My team can work on a plugin that does test discovery and then is able to pass that straight into the normal test runner.

Please let me know which one is ideal, or if there is a third way I have not thought of.

EDIT: To highlight the severity-- my remote development VM just crashed because I'm using too much memory when I ran test discovery.

@github-actions github-actions bot removed the info-needed Issue requires more information from poster label Sep 28, 2023
@brettcannon
Copy link
Member

If we can't get test discovery and running to work, then we (the devx team) can never recommend VS code to the engineers and support it as a first class citizen

We're sorry to hear that, but we would rather not circumvent pytest for discovery which the vast majority of our users rely on and expect us to use directly (due to pytest plug-ins, it being the way pytest operates itself when they do things from the terminal, etc.). We purposefully try to integrate with tools to provide a VS Code UI on top of them and not circumvent them.

If you can create a pytest plug-in that does what you want then that will work in the future as we are working toward allowing users to provide distinct settings for discovery which would allow you to specify your plug-in only in the discover phase.

Otherwise I'm afraid you're looking at doing your own Python test extension.

@github-actions github-actions bot added the info-needed Issue requires more information from poster label Sep 28, 2023
@juandiegopalomino
Copy link
Author

If you can create a pytest plug-in that does what you want then that will work in the future as we are working toward allowing users to provide distinct settings for discovery which would allow you to specify your plug-in only in the discover phase.

So if I am understanding correctly, in the future I can make a plugin that just handles test discovery -- i.e. it passes some kind of list of objects specifying tests that were found and to display as possible to run, and then the official python plugin takes it from there (handles executing and displaying). If that it the case, then that is awesome, and would totally work. What is the timeline for this?

@github-actions github-actions bot removed the info-needed Issue requires more information from poster label Sep 28, 2023
@juandiegopalomino

This comment was marked as off-topic.

@juandiegopalomino

This comment was marked as off-topic.

@eleanorjboyd
Copy link
Member

@karthiknadig, could you confirm this from a technical perspective?

@github-actions github-actions bot added the info-needed Issue requires more information from poster label Oct 3, 2023
@brettcannon
Copy link
Member

brettcannon commented Oct 3, 2023

So if I am understanding correctly, in the future I can make a plugin that just handles test discovery -- i.e. it passes some kind of list of objects specifying tests that were found and to display as possible to run, and then the official python plugin takes it from there (handles executing and displaying). If that it the case, then that is awesome, and would totally work.

You make a pytest plug-in and if you can get pytest to do what you want then that will be doable as we are working on letting you specify the command we pass to pytest for discovery separate from execution. But I don't know if pytest specifically gives you that level of flexibility to short-circuit discovery and take it over completely. Our own code to find out what tests you have comes as a pytest plug-in itself where we just inspect the tests that happen to be found.

What is the timeline for this?

We are actively working on it, but otherwise we don't give out ETAs.

I will also ask that you please be patient and give us a week to respond to your questions. We had a two-day team offsite and it was a holiday here in Canada on Monday.

@juandiegopalomino
Copy link
Author

Super, no worries! I hope you had a good offsite!

@eleanorjboyd
Copy link
Member

HI @juandiegopalomino, our rewrite is now out via the experiment to 100% of stable users. We do not have a finalized timeline for when we will move the rewrite to no longer being behind an experiment but maybe Feb next year to give you an idea. This issue will track the progress of that rollout: #20086. I am going to close this issue as that issue will allow you to get your answer on when it is released. Thanks!

@eleanorjboyd eleanorjboyd closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023
@github-actions github-actions bot added the info-needed Issue requires more information from poster label Dec 7, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-testing info-needed Issue requires more information from poster triage-needed Needs assignment to the proper sub-team
Projects
None yet
Development

No branches or pull requests

4 participants