-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fv3fit unit tests are slow #2057
Comments
I'm surprised to see some tests from |
Many of these test are marked with
This runs in about 80s on my VM. Though coverage drops significantly for certain parts of the code base. Here are the slowest 'not slow' tests:
|
When I first made the slow markers I marked every test over 1s long, should definitely add the cyclegan test and possibly the others to that mark. The somewhat ludicrous test time for autoencoder and cyclegan is something we could talk about @oliverwm1. These run much faster on my machine, but apparently only because I have M1 acceleration. Solutions include:
|
One issue to keep in mind is that the coverage drops a lot when excluding the slow tests (adding a marker for the cyclegan test. Here is the coverage of subdirs:
|
Thanks @mcgibbon @nbren12. Agreed running the 'not slow' tests locally is a good idea for local dev. But as Noah points out we are relying on the slow tests for a lot of our coverage. Can we be more judicious about these tests (i.e. maintain similar coverage with fewer slow tests) or speed them up? I do think the long test time on CI is increasing friction for reviewing/merging PRs but I would also be a bit nervous about only doing slow tests infrequently. |
I find almost all of the time a test that only runs the training code but doesn't check how well it trained will catch bugs introduced into the training codes (and have identical coverage, technically). The "skill" tests only need to be run when those training functions are edited, so we could treat them as a trigger and main test similar to the argo integration tests. I did try hard to decrease the run time of the pytorch training tests, but pytorch is generally very slow on cpu because of how it runs python code every training epoch, and there's only so much we can do about it. The architectures implemented also just need more data/epochs to train. |
Would it be worth writing more unit tests @mcgibbon of e.g. the model objects etc? top-level training tests can be hard to make fast, though I agree that they do tend to find bugs easily. |
There's a tradeoff with that solution - It's hard to come up with these tests, and if you do rely extensively on unit tests of components of the model building / training functions, it means you will frequently need to refactor or update tests when you're only changing internal implementation details of the model, including style refactors. I don't know if it even can be done in this case, the best I could think of for sub-testing CycleGAN was to include the autoencoder which is already expensive to train. The best answer may just be "training global models is expensive and tests of this will be expensive". |
Seems like the generator and discriminator could be unit tested relatively easily. E.g. returns the right shape, o(1) outputs, or gradient of output with respect to input is nonzero...etc. Presumably these are being saved as artifacts so one would want to maintain some kind of backwards compatibility. These type of tests are fast and usually give a nice amount of coverage even if they aren't fully comprehensive. The glue/config/factory/training function code can be bit tricker to test effectively than the basic architectures/models, but often the typechecker can find those kind of integration bugs. |
fv3fit unit tests are slow. They take about 10min to run on my Mac and more like 15-20min on CI. Development would be easier if the tests ran faster.
In #2055 I added output for the duration of the slowest tests. For fv3fit tests run on CI (see here) they are:
Some individual tests are very slow (
test_autoencoder
!) and others (e.g.test_cli
) are probably run for too many different parameters.Related: #2040
The text was updated successfully, but these errors were encountered: