Revise testing infrastructure to decrease spurious failures #759

cboettig · 2024-01-31T05:01:25Z

The testing infrastructure fails to successfully test any cuda images due to disk space limitations. This separates out the testing of cuda images from other scripts.

Tests involving rstudio-daily are also failing continuously due to server issues with the downloads.

Both of these create conditions that guarantee test failure, making it impossible for PRs to satisfy all checks. This either prevents PR contributions or means that PR contributions are merged with some failing tests, neither of which is a good solution.

cboettig · 2024-01-31T05:26:48Z

@eitsupi I think all the tests in https://github.com/rocker-org/rocker-versioned2/actions/runs/7721419467/workflow?pr=759 run on the same runner? public repo runners for linux have 150 GB. it looks like just most small change triggers the full matrix of tests, and at a the moment that full matrix of tests just can't actually run on the runner, as there isn't space? e.g in this initial test PR which shows 6 of the 38 tests failing. They fail for various reasons, though none are related to the change in the PR, they are either network issues or lack of space.

I think your plans in #755 to better leverage cache in the redesign build infrastructure will greatly improve this situation. But meanwhile I think it's difficult to make a meaningful PR against the repo that won't hit failed tests for unrelated issues, especially the disk space error.

I will try testing out some options here for a potentially slimmer test matrix here as a work-around...

eitsupi · 2024-01-31T10:45:14Z

As I commented elsewhere, the capacity issue should be resolved by removing unnecessary software.
For example:

rocker-versioned2/.github/workflows/reports.yml

Lines 49 to 51 in 26c50e5

    
                 - name: Clean up 
        
                   run: | 
        
                     docker image prune --all --force

I think this is affected by a change made a while ago that increased the rocker/cuda image's capacity by several gigabytes.

I think reducing tests and merging incorrect changes is just as bad an idea as ignoring and merging tests that randomly fail.

cboettig · 2024-01-31T15:23:41Z

@eitsupi Thanks for your help here. To be clear, we are entirely on the same page about not reducing testing and not merging incorrect changes. Having unrelated tests fail for unrelated reasons does not reduce incorrect changes. The current testing does not cover most cases anyway, and I'd like to actually add more tests to get better coverage, not less. As I said at the top of this, I'm not seeking to remove tests overall, I'm testing the removal of tests here to try and get a handle on disk use so I can add tests. We are on the same objective here.

I cannot currently fix things that are broken and have been broken for a long time and have never been covered by our tests while PRs are throwing errors that are entirely unrelated to those changes and are also not reflective of problems either in the existing stack or the proposed changes.

I don't understand the solution you are proposing -- that we shrink the size of the images themselves or that we free space from other software on the host runner? You suggested removing arrow libraries I think? As I noted in #756, adding support for tensorflow and torch libraries adds 4 - 5 GB each. If you are aware of unnecessary software that could free enough extra space to test ML libraries then a PR would be awesome. The test images should have 150 GB of disk. We should be able to run tests on the 13 GB base cuda image. While the matrix setup is nice, perhaps it would avoid these issues to have tests handled on different runners?

The current test design is not compatible with testing the large images involved in the machine learning stack. I suggest we move that testing to a separate runner. We may want it on a separate runner anyway so we have the option of self-hosted runner setup to run these images on GPU.

eitsupi · 2024-01-31T15:28:18Z

Sorry I didn't explain better, but my point was that we can free up space by deleting unnecessary stuff on the runner, which is what Apache Arrow's main repository does thoroughly, and I think the script below does just that.
https://github.com/apache/arrow/blob/787afa1594586d2d556d21471647f9cd2c55b18f/ci/scripts/util_free_space.sh

If I remember correctly, all testing is done on a separate runner, so the reason why the test for the ML image fails is simply because that image is too huge.

daily is already being built daily, and it has been failing for a week anyway. It should not be tested on PRs that only touch unconnected parts of the stack

cboettig · 2024-02-01T01:42:18Z

In the above edits, I have moved cuda images out of the tests/rocker_scripts/matrix.json, since that runner simply does not have enough space to test anything involving the now 13 GB cuda base image, let alone the potentially larger derivative images.

I've moved cuda into a separate workflow. I initially structured this off the same design as the rocker_scripts test, but even with a single test there it doesn't have enough space. I don't really understand why -- maybe buildkit is using additional space? I rewrote the action to concise simple docker build test, which runs just fine. Anyway I think this is the correct direction to go in -- after all, as I mentioned, I'd like to consider actually testing cuda images on GPU machines with self-hosted runners.

@eitsupi It would be great if you'd like to add those image-size-reducing changes from arrow, I'm reluctant to paste them in here as it adds complexity to the build system and I don't really understand what it is doing. (For instance, I don't see how it can really free 20 GB from every ubuntu 22.04 runner, when according to the GtiHub docs runners on private repos don't even have 20 GB SSDs to begin with...)

Anyway, I'd like to move forward on this so we can get testing unstuck for #756 and so we can start providing the main ML frameworks on our ML-tagged images.

eitsupi

Thanks for looking into this.

Please update the PR title and discription.

eitsupi · 2024-02-01T04:01:08Z

.github/workflows/cuda-test.yml

Is there any reason why we should keep this in a separate yml file?
Is splitting another job not enough?

Sorry, not entirely sure I follow. I think having a separate script is necessary to run on another runner? I find this architectural design easier to understand and maintain -- it's fewer pieces and distinct parts are isolated in different files. By splitting another job you mean doing this through the matrix.json file? I did look at the design of a matrix.json for this (in a separate yaml file, though perhaps could be merged): https://github.com/rocker-org/rocker-versioned2/pull/759/files#diff-454bb856f3821991ff2015ec5ba81c69df2ae3b5a476e30104d697c420b35093, and it hits the same issue with space.

My suggestion is simply that it is impossible to move the job to scripts-test.yml.

I don't know if they're sharing capacity on a per-job basis or on a per-workflow basis, but I think it's unlikely that they're sharing it across workflows.
So you probably don't need to split the workflow for a single job here.

oh sorry to be dense, I think I understand. I've always tended to write actions with basically one job per yaml, unless there were two dependent jobs (e.g. like generate_matrix and build scripts-test.yml). I think I do the same thing with, e.g. writing R functions in many different files in R/. I see the cuda tests as a bit of a work in progress, while for the moment scripts-test.yml has been pretty stable, so conceptually to me anyway this provides a bit more separation between the component I'm still intending to fiddle with and the component that works. Why have all the jobs in the same yaml file?

.github/workflows/cuda-test.yml

eitsupi · 2024-02-01T04:01:32Z

.github/workflows/cuda-test.yml

+on:
+  workflow_dispatch: null
+  push:
+    paths:
+      - tests/ml-test.Dockerfile
+      - .github/workflows/cuda-test.yml


Please reconsider.

walk me through your thinking? if we change the workflow or the test, we should run the workflow, right? or is the former implicit anyway?

Like this:

rocker-versioned2/.github/workflows/scripts-test.yml

Lines 3 to 14 in 26c50e5

on:

pull_request:

branches:

- master

paths:

- tests/rocker_scripts/Dockerfile

- tests/rocker_scripts/matrix.json

- tests/rocker_scripts/test.sh

- scripts/*.sh

- "!scripts/install_R_*.sh"

- "!scripts/setup_R.sh"

workflow_dispatch:

I don't think push should be used without specifying a tag or branch.

The actual target we want to test is the PRs, not pushes.

oh thanks for explaining, got it! done now.

eitsupi · 2024-02-01T04:01:48Z

.github/workflows/cuda-test.yml

+jobs:
+  build:
+    runs-on: ubuntu-latest
+    permissions: write-all


write-all is needed here?

good question, let's try without that.

Sorry. What I was trying to say was that read permission would be sufficient.

But I don't think this test is ever using the GITHUB_TOKEN here anyway? I think I had left write-all in by mistake from a previous action that was also pushing to github (and even then it should plausibly have a more scoped write permission). I think the test is happy without this.

Since this repository is old, I assume write-all privileges are granted by default, but if you follow best practices, you should grant only minimal privileges.
Since we are only reading files here, I expect read permission to be sufficient.

oh good to know! I didn't realize they did legacy privileges but of course that makes sense or actions would break. I've gone with read-all now.

tests/ml-test.Dockerfile

eitsupi · 2024-02-01T04:02:25Z

tests/rocker_scripts/matrix.json

@@ -33,7 +32,7 @@
      "base_image": "rocker/r-ver",
      "tag": "devel",
      "script_name": "install_rstudio.sh",
-      "script_arg": "daily"
+      "script_arg": "latest"


Seems unrelated change.

As you know, daily has been failing for a week due the RStudio server throwing a 500 error on that download. Because we try to build that image daily in the cron tasks anyway, I think it's less important to also test it here, especially when it means that the current situation ends up blocking the ability to fix anything. Maybe you can provide your preferred solution?

Oh, I'm sorry. I didn't know that the daily build was failing. Indeed, as you say, this is used every day, so we don't have to dare to test it here.

I'm sure that "latest" was tested elsewhere (because latest is the default), so I think it's okay to delete the test itself.

ok thanks. yeah sorry I get the emails every day when the daily fails (https://github.com/rocker-org/rocker-versioned2/actions/workflows/devel.yml) which happens all the time. It looked like this test was building rstudio on r-ver:devel while other tests were building it on the normal r-ver so I didn't delete it, but if you're fine dropping this test then so am I!

tests/rocker_scripts/matrix.json

cboettig added 3 commits January 30, 2024 21:00

test tests

2f15536

test on slimmer subset of images

09a2dad

daily has been unable to pull and thus failing to build for a while

e7491a6

cboettig added 2 commits January 30, 2024 21:41

🧹

5422f0d

??

a1afa20

original tests

b9bd40e

cboettig added 7 commits January 31, 2024 07:28

Let's test cuda seperately

57854b9

test latest on devel instead of daily.

586739a

daily is already being built daily, and it has been failing for a week anyway. It should not be tested on PRs that only touch unconnected parts of the stack

can we test cuda successfully seperately?

6b8ac78

?

af17389

name

c7a4f85

simple test

9174ee4

linting

26fb6ae

cboettig marked this pull request as ready for review February 1, 2024 01:42

eitsupi requested changes Feb 1, 2024

View reviewed changes

cboettig changed the title ~~testing the tests~~ Revise testing infrastructure to decrease spurious failures Feb 1, 2024

cboettig added 5 commits January 31, 2024 20:17

rename and drop write-all

1e5c56d

remove comments

f19aebe

lint

1c82708

read permissions

af118f2

lint

48d7500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise testing infrastructure to decrease spurious failures #759

Revise testing infrastructure to decrease spurious failures #759

cboettig commented Jan 31, 2024 •

edited

Loading

cboettig commented Jan 31, 2024

eitsupi commented Jan 31, 2024 •

edited

Loading

cboettig commented Jan 31, 2024

eitsupi commented Jan 31, 2024

cboettig commented Feb 1, 2024

eitsupi left a comment

eitsupi Feb 1, 2024

cboettig Feb 1, 2024

eitsupi Feb 1, 2024

cboettig Feb 1, 2024

eitsupi Feb 1, 2024

cboettig Feb 1, 2024

eitsupi Feb 1, 2024

cboettig Feb 1, 2024

eitsupi Feb 1, 2024

cboettig Feb 1, 2024

eitsupi Feb 1, 2024 •

edited

Loading

cboettig Feb 1, 2024

eitsupi Feb 1, 2024 •

edited

Loading

cboettig Feb 1, 2024

eitsupi Feb 1, 2024

cboettig Feb 1, 2024

eitsupi Feb 1, 2024

cboettig Feb 1, 2024

	on:
	pull_request:
	branches:
	- master
	paths:
	- tests/rocker_scripts/Dockerfile
	- tests/rocker_scripts/matrix.json
	- tests/rocker_scripts/test.sh
	- scripts/*.sh
	- "!scripts/install_R_*.sh"
	- "!scripts/setup_R.sh"
	workflow_dispatch:

Revise testing infrastructure to decrease spurious failures #759

Are you sure you want to change the base?

Revise testing infrastructure to decrease spurious failures #759

Conversation

cboettig commented Jan 31, 2024 • edited Loading

cboettig commented Jan 31, 2024

eitsupi commented Jan 31, 2024 • edited Loading

cboettig commented Jan 31, 2024

eitsupi commented Jan 31, 2024

cboettig commented Feb 1, 2024

eitsupi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eitsupi Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eitsupi Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cboettig commented Jan 31, 2024 •

edited

Loading

eitsupi commented Jan 31, 2024 •

edited

Loading

eitsupi Feb 1, 2024 •

edited

Loading

eitsupi Feb 1, 2024 •

edited

Loading