Selective Execution based on input file and code changes #4091

lihaoyi · 2024-12-09T03:29:45Z

Fixes #4024 and fixes #3534

This PR adds support for selective execution based on build input changes and build code changes. This allows us to consider the changes to build inputs and build code and do a down-stream graph traversal to select the potentially affected tasks, without needing to go through the entire evaluation/cache-loading/etc. process to individually skip cached tasks.

This is a major feature needed to support large monorepos, where you want to only run the portion of your build/tests relevant to your code change, but you want that selection of that portion to be done automatically by your build tool

Motivation

Selective execution differs from the normal "cache things and skip cached tasks" workflow in a few ways:

We never need to actually run tasks in order to skip them
- e.g. a CI worker can run selective.prepare on main, selective.run on the PR branch, and skip the un-affected tasks without ever running them in the first place.
- Or if the tasks were run earlier on main on a different machine, we do not need to do the plumbing necessary to move the caches onto the right machine to be used
We can skip tasks entirely without going through the cache load-and-validate process.
- In my experience with other build tools (e.g. Bazel), this can result in significantly better performance for skipped tasks
We can skip the running of Task.Commands as well, which normally never get cached

Workflows

There are two workflows that this PR supports.

Manual Task Selection

./mill selective.prepare <selector> saves an out/mill-selective-execution.json file storing the current taskCodeSignatures and inputHashes for all tasks and inputs upstream of <selector>
./mill selective.run <selector> loads out/mill-selective-execution.json, compares the previouos taskCodeSignatures and inputHashes with their current values, and then only executes (a) tasks upstream of <selector> and (b) downstream of any tasks or inputs that changed since selective.prepare was run

This workflow is ideal for things like CI machines, where step (1) is run on the target branch, (2) is run on the PR branch, and only the tasks/tests/commands that can potentially be affected by the PR code change will be run

The naming of selective.prepare and selective.run is a bit awkward and can probably be improved.

Watch And Re Run Selection

When you use ./mill -w <selector>, we only re-run tasks in <selector> if they are downstream of inputs whose values changed or tasks whose code signature changed

This is a common use case when watching a large number of commands such as runBackground, where you only want to re-run the commands related to the build inputs or build code you changed.

It's possible that there are other scenarios where selective execution would be useful, but these were the ones I came up with for now

Implementation

Implementation wise, we re-use of a lot of existing logic from EvaluatorCore/GroupEvaluator/MainModule, extracted into CodeSigUtils.scala and SelectiveExecution.scala.

The "store inputHashes/methodCodeHashSignatures (grouped together as SelectiveExecution.Metadata) from before, compare to value after, traverse graph to find affected tasks" logic is relatively straightforward, though the actual plumbing of the SelectiveExecution.Metadata data to the code where it is used can be tricky. We need to store it on disk in order to support mill --no-server, so we serialize it to out/mill-selective-execution.json

The core logic is shared between the two workflows above, as is most of the plumbing, although they hook into very different parts of the Mill codebase

Covered by some basic integration tests for the various code paths above, and one example test using 3 JavaModules

Limitations

Currently doesn't work with -w show, due to how show has two nested evaluations that are hard to keep track of which one should be selective and which one shouldn't. This is part of the general problem discussed in #502 and I think we can punt on a solution for now

Cannot yet be used on the com-lihaoyi/mill repo, due to the build graph unnecessarily plumbing millVersion everywhere which invalidates everything when the git dirty sha changes. Will clean that up in a follow up

lihaoyi · 2024-12-09T12:54:20Z

This isn't ready to merge yet but I think it's worth starting the review/discussion, @jodersky @lefou @lolgab @alexarchambault. The code works and the tests pass (on my machine), but it needs to be implemented/tested better before being ready to go in

lihaoyi · 2024-12-10T08:15:43Z

Most tests are green now, we needed to special case Command(exclusive = true) to make things like --watch show <selector> work, in the spirit of #502. However, a bit more thought is probably necessary in order to make --watch show support selective execution in a reasonable manner

This is necessary in order to make it conditionally used (e.g. only enabled based on an environment variable), so Mill's own code can take advantage of com-lihaoyi/mill#4091 As an `Task.Input`, it always is present in the task graph even when the value is ignored due to an `if` conditional, meaning it always triggers downstream tasks and tests during selective execution. Whereas as a helper method I can guard against its usage in an `if` statement (e.g. checking a `sys.env` variable), and if it isn't used it never ends up in the build graph at all. Pull request: #165

jodersky · 2024-12-11T10:44:34Z

example/depth/large/9-selective-execution/build.mill

+// Selective execution is very useful for larger codebases, where you are usually changing
+// only small parts of it, and thus only want to run the tests related to your changes.
+// This keeps CI times fast and prevents unrelated breakages from affecting your CI runs.


Not sure if this is the right place, but it might be useful to give a blueprint on how this could be implemented in CI, e.g. with github actions.

If I understand correctly, you'd want to save mill-selective-execution.json in the main branch and fetch it for every pull request CI run. Is that correct?

The docs are meant to explain how to make use of it. Not sure if I did a good job tho

I think they're pretty clear! I guess what my comment boils down to, is that there's no mention of mill-selective-execution.json in the docs, which, if I understand correctly, is the file that needs to be created on CI-main runs and used in CI-PR runs.

But it's a detail and we can ignore it, maybe adding examples later on.

nafg · 2024-12-15T04:42:48Z

Shouldn't the docs be on https://mill-build.org/mill/cli/builtin-commands.html?

lihaoyi added 3 commits December 9, 2024 11:24

basic input change selection works

b88b299

basic input change selection works

ac2c65a

.

fd1561b

lihaoyi marked this pull request as draft December 9, 2024 03:29

lihaoyi added 7 commits December 9, 2024 11:31

selective-changed-code test case

b470d81

.

4d2f5eb

.

88c9102

.

87bcfee

.

eeccc66

.

96406f0

.

7cf975a

lihaoyi mentioned this pull request Dec 9, 2024

Clean up RunScript helpers and Evaluator code #4094

Open

lihaoyi requested review from lefou, jodersky and lolgab December 9, 2024 12:53

lihaoyi added 12 commits December 10, 2024 10:09

.

081594b

cleanup

038cca2

basic failure test

3836297

.

837925a

.

8869615

.

36e56ae

.

1b93107

.

6e138ba

.

eb3b4e8

.

cecc6e8

.

95584f9

.

2800405

add failing watch.show-changed-input test

12621a8

lihaoyi added 2 commits December 10, 2024 17:39

.

295b4f2

.

43fda15

lihaoyi marked this pull request as ready for review December 10, 2024 12:23

lihaoyi changed the title ~~[WIP] Selective Execution~~ Selective Execution Dec 10, 2024

lihaoyi added 2 commits December 11, 2024 09:59

.

4064c81

.

ab914ef

lihaoyi changed the title ~~Selective Execution~~ Selective Execution based on input file and code changes Dec 11, 2024

lihaoyi added 8 commits December 11, 2024 10:20

.

f2925ce

.

650b0fe

.

5fa8110

.

d589f51

.

d6f89f1

.

2ec514d

.

fbb67f1

.

7e13f75

lihaoyi mentioned this pull request Dec 11, 2024

Expose calcVcsState publicly for external usage lefou/mill-vcs-version#165

Merged

lihaoyi added 3 commits December 11, 2024 15:02

retries

eba08cd

retries

bf8ab42

cleanup

32a232c

cleanup

3fb4e72

jodersky reviewed Dec 11, 2024

View reviewed changes

lihaoyi merged commit 24ba0d3 into com-lihaoyi:main Dec 11, 2024
27 checks passed

lihaoyi mentioned this pull request Dec 11, 2024

Fine-grained selective testing at a class-level granularity #4109

Open

lefou added this to the 0.12.4 milestone Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selective Execution based on input file and code changes #4091

Selective Execution based on input file and code changes #4091

lihaoyi commented Dec 9, 2024 •

edited

Loading

lihaoyi commented Dec 9, 2024 •

edited

Loading

lihaoyi commented Dec 10, 2024

jodersky Dec 11, 2024 •

edited

Loading

lihaoyi Dec 11, 2024

jodersky Dec 11, 2024 •

edited

Loading

nafg commented Dec 15, 2024

Selective Execution based on input file and code changes #4091

Selective Execution based on input file and code changes #4091

Conversation

lihaoyi commented Dec 9, 2024 • edited Loading

Motivation

Workflows

Manual Task Selection

Watch And Re Run Selection

Implementation

Limitations

lihaoyi commented Dec 9, 2024 • edited Loading

lihaoyi commented Dec 10, 2024

jodersky Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

lihaoyi Dec 11, 2024

Choose a reason for hiding this comment

jodersky Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

nafg commented Dec 15, 2024

lihaoyi commented Dec 9, 2024 •

edited

Loading

lihaoyi commented Dec 9, 2024 •

edited

Loading

jodersky Dec 11, 2024 •

edited

Loading

jodersky Dec 11, 2024 •

edited

Loading