Don’t download model files for every test run #238

JostMigenda · 2023-05-16T22:20:49Z

Right now, tests on GitHub Actions need to download all model files for every test run. This has multiple disadvantages:

It breaks tests on PRs that add new model files, see e.g. Missing directions/masses in Walk and Tamborra models #231 and in particular this comment
It means tests running on GitHub may behave subtly different than on a developer’s local computer (where the model files are likely already downloaded and available in snewpy.model_path)
Needing to download 100s of MB of model files slows down tests and wastes energy and bandwidth.

At least for model files that are in the GitHub repository anyway, the files are already available on the test runner; it would be sufficient to change snewpy.model_path to point to the models/ directory in the repo. We could then also eliminate a lot of duplicate testing code in tests/test_0{1,2}*.py.
(Instead, we would probably need one download test running as part of the release workflow—after tagging, but before uploading the release to PyPI.)

Open questions: How to deal with models whose files are not in the GitHub repo? (E.g. on Zenodo?)

The text was updated successfully, but these errors were encountered:

Sheshuk · 2023-05-17T09:34:47Z

Hi Jost, this looks like a good solution for the time being.
But I thought the idea in the long run was to eventually move the data files to Zenodo or elsewhere, no?
Currently having the data strictly tied to the current release of SNEWPY on github leads to problems like this - when we change the data, but have no possibility to work with it properly, since it is not in the release yet.

I think moving all the data to an outside repository/Zenodo entry would solve all of the mentioned problems.
Also developer won't have to download the data that is not needed, and cloning SNEWPY will be fast. And when the data files are updated, we can just make a new release of the data repository, decoupled from the development of the software

JostMigenda · 2023-05-17T13:29:23Z

Moving the data files elsewhere would solve the first point; but not the second and third one. (In particular, having the files outside of GitHub would potentially make the third point much worse—downloading files over the internet from e.g. Zenodo is much slower than downloading them from another GitHub server that’s potentially a few meters away in the same data centre.)

For adding new models, yes, being able to use them directly from Zenodo (where modeling groups have hopefully made them available themselves) would be good. For existing model files, I’m not sure whether it’s worth the effort to move them elsewhere. But as you say, that’s a discussion for the long run; if we agree that the proposal above is a good solution in the meantime & since it doesn’t impede other long-term solutions, I think that’s good enough for now.

Sheshuk · 2023-05-17T14:06:53Z

For the second problem: having all the models decoupled from the main SNEWPY repository means that the developer will be in exactly the same position as the testing script in the GitHub job. Download the models, or reuse the ones that are already in cache.

In particular, having the files outside of GitHub would potentially make the third point much worse—downloading files over the internet from e.g. Zenodo is much slower than downloading them from another GitHub server that’s potentially a few meters away in the same data centre.

I agree, this might be the case (although it's worth checking if this problem is real).
But then what we can do is to put all the models we have into a separate GitHub repository, and version it any way we like.

JostMigenda · 2023-05-17T16:46:32Z

For Zenodo, we’ve previously observed issues where downloads would fail completely or tests would take ~40 min to run, largely due to slow downloads. It’s possible that was some temporary hiccup for a few days; but even if it’s reliable and faster now, it would be good to avoid even a few min extra wait on every test run.

Sheshuk · 2023-05-17T20:07:08Z

I see, thanks for pointing to the issue.
But what do you think about having models just in a separate repository (or several)?
This can be done real quick.

One more thought about running tests: it's possible to cache the directories between the runs, I think this is done especially for such use case

JostMigenda self-assigned this May 16, 2023

This was referenced May 20, 2023

Adding default option of downloading files from given URL, instead of fixed github one #247

Merged

Put model files in a separate repository #250

Closed

Sheshuk mentioned this issue Jun 3, 2023

Test suites for the PR labels? #256

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don’t download model files for every test run #238

Don’t download model files for every test run #238

JostMigenda commented May 16, 2023

Sheshuk commented May 17, 2023

JostMigenda commented May 17, 2023

Sheshuk commented May 17, 2023

JostMigenda commented May 17, 2023

Sheshuk commented May 17, 2023

Don’t download model files for every test run #238

Don’t download model files for every test run #238

Comments

JostMigenda commented May 16, 2023

Sheshuk commented May 17, 2023

JostMigenda commented May 17, 2023

Sheshuk commented May 17, 2023

JostMigenda commented May 17, 2023

Sheshuk commented May 17, 2023