Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Selective Execution based on input file and code changes (#4091)
Fixes #4024 and fixes #3534 This PR adds support for selective execution based on build input changes and build code changes. This allows us to consider the changes to build inputs and build code and do a down-stream graph traversal to select the potentially affected tasks, without needing to go through the entire evaluation/cache-loading/etc. process to individually skip cached tasks. This is a major feature needed to support large monorepos, where you want to only run the portion of your build/tests relevant to your code change, but you want that selection of that portion to be done automatically by your build tool ## Motivation Selective execution differs from the normal "cache things and skip cached tasks" workflow in a few ways: 1. We never need to actually run tasks in order to skip them * e.g. a CI worker can run `selective.prepare` on `main`, `selective.run` on the PR branch, and skip the un-affected tasks without ever running them in the first place. * Or if the tasks were run earlier on `main` on a different machine, we do not need to do the plumbing necessary to move the caches onto the right machine to be used 2. We can skip tasks entirely without going through the cache load-and-validate process. * In my experience with other build tools (e.g. Bazel), this can result in significantly better performance for skipped tasks 4. We can skip the running of `Task.Command`s as well, which normally never get cached ## Workflows There are two workflows that this PR supports. ### Manual Task Selection 1. `./mill selective.prepare <selector>` saves an `out/mill-selective-execution.json` file storing the current `taskCodeSignatures` and `inputHashes` for all tasks and inputs upstream of `<selector>` 2. `./mill selective.run <selector>` loads `out/mill-selective-execution.json`, compares the previouos `taskCodeSignatures` and `inputHashes` with their current values, and then only executes (a) tasks upstream of `<selector>` and (b) downstream of any tasks or inputs that changed since `selective.prepare` was run This workflow is ideal for things like CI machines, where step (1) is run on the target branch, (2) is run on the PR branch, and only the tasks/tests/commands that can potentially be affected by the PR code change will be run The naming of `selective.prepare` and `selective.run` is a bit awkward and can probably be improved. ### Watch And Re Run Selection 1. When you use `./mill -w <selector>`, we only re-run tasks in `<selector>` if they are downstream of inputs whose values changed or tasks whose code signature changed This is a common use case when watching a large number of commands such as `runBackground`, where you only want to re-run the commands related to the build inputs or build code you changed. ----- It's possible that there are other scenarios where selective execution would be useful, but these were the ones I came up with for now ## Implementation Implementation wise, we re-use of a lot of existing logic from `EvaluatorCore`/`GroupEvaluator`/`MainModule`, extracted into `CodeSigUtils.scala` and `SelectiveExecution.scala`. The _"store `inputHashes`/`methodCodeHashSignatures` (grouped together as `SelectiveExecution.Metadata`) from before, compare to value after, traverse graph to find affected tasks"_ logic is relatively straightforward, though the actual plumbing of the `SelectiveExecution.Metadata` data to the code where it is used can be tricky. We need to store it on disk in order to support `mill --no-server`, so we serialize it to `out/mill-selective-execution.json` The core logic is shared between the two workflows above, as is most of the plumbing, although they hook into very different parts of the Mill codebase Covered by some basic integration tests for the various code paths above, and one example test using 3 `JavaModule`s ## Limitations Currently doesn't work with `-w show`, due to how `show` has two nested evaluations that are hard to keep track of which one should be selective and which one shouldn't. This is part of the general problem discussed in #502 and I think we can punt on a solution for now Cannot yet be used on the com-lihaoyi/mill repo, due to the build graph unnecessarily plumbing `millVersion` everywhere which invalidates everything when the git dirty sha changes. Will clean that up in a follow up
- Loading branch information