-
-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selective Execution based on input file and code changes #4091
Conversation
This isn't ready to merge yet but I think it's worth starting the review/discussion, @jodersky @lefou @lolgab @alexarchambault. The code works and the tests pass (on my machine), but it needs to be implemented/tested better before being ready to go in |
Most tests are green now, we needed to special case |
This is necessary in order to make it conditionally used (e.g. only enabled based on an environment variable), so Mill's own code can take advantage of com-lihaoyi/mill#4091 As an `Task.Input`, it always is present in the task graph even when the value is ignored due to an `if` conditional, meaning it always triggers downstream tasks and tests during selective execution. Whereas as a helper method I can guard against its usage in an `if` statement (e.g. checking a `sys.env` variable), and if it isn't used it never ends up in the build graph at all. Pull request: #165
// Selective execution is very useful for larger codebases, where you are usually changing | ||
// only small parts of it, and thus only want to run the tests related to your changes. | ||
// This keeps CI times fast and prevents unrelated breakages from affecting your CI runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the right place, but it might be useful to give a blueprint on how this could be implemented in CI, e.g. with github actions.
If I understand correctly, you'd want to save mill-selective-execution.json
in the main branch and fetch it for every pull request CI run. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs are meant to explain how to make use of it. Not sure if I did a good job tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're pretty clear! I guess what my comment boils down to, is that there's no mention of mill-selective-execution.json
in the docs, which, if I understand correctly, is the file that needs to be created on CI-main runs and used in CI-PR runs.
But it's a detail and we can ignore it, maybe adding examples later on.
Shouldn't the docs be on https://mill-build.org/mill/cli/builtin-commands.html? |
Fixes #4024 and fixes #3534
This PR adds support for selective execution based on build input changes and build code changes. This allows us to consider the changes to build inputs and build code and do a down-stream graph traversal to select the potentially affected tasks, without needing to go through the entire evaluation/cache-loading/etc. process to individually skip cached tasks.
This is a major feature needed to support large monorepos, where you want to only run the portion of your build/tests relevant to your code change, but you want that selection of that portion to be done automatically by your build tool
Motivation
Selective execution differs from the normal "cache things and skip cached tasks" workflow in a few ways:
We never need to actually run tasks in order to skip them
selective.prepare
onmain
,selective.run
on the PR branch, and skip the un-affected tasks without ever running them in the first place.main
on a different machine, we do not need to do the plumbing necessary to move the caches onto the right machine to be usedWe can skip tasks entirely without going through the cache load-and-validate process.
We can skip the running of
Task.Command
s as well, which normally never get cachedWorkflows
There are two workflows that this PR supports.
Manual Task Selection
./mill selective.prepare <selector>
saves anout/mill-selective-execution.json
file storing the currenttaskCodeSignatures
andinputHashes
for all tasks and inputs upstream of<selector>
./mill selective.run <selector>
loadsout/mill-selective-execution.json
, compares the previouostaskCodeSignatures
andinputHashes
with their current values, and then only executes (a) tasks upstream of<selector>
and (b) downstream of any tasks or inputs that changed sinceselective.prepare
was runThis workflow is ideal for things like CI machines, where step (1) is run on the target branch, (2) is run on the PR branch, and only the tasks/tests/commands that can potentially be affected by the PR code change will be run
The naming of
selective.prepare
andselective.run
is a bit awkward and can probably be improved.Watch And Re Run Selection
./mill -w <selector>
, we only re-run tasks in<selector>
if they are downstream of inputs whose values changed or tasks whose code signature changedThis is a common use case when watching a large number of commands such as
runBackground
, where you only want to re-run the commands related to the build inputs or build code you changed.It's possible that there are other scenarios where selective execution would be useful, but these were the ones I came up with for now
Implementation
Implementation wise, we re-use of a lot of existing logic from
EvaluatorCore
/GroupEvaluator
/MainModule
, extracted intoCodeSigUtils.scala
andSelectiveExecution.scala
.The "store
inputHashes
/methodCodeHashSignatures
(grouped together asSelectiveExecution.Metadata
) from before, compare to value after, traverse graph to find affected tasks" logic is relatively straightforward, though the actual plumbing of theSelectiveExecution.Metadata
data to the code where it is used can be tricky. We need to store it on disk in order to supportmill --no-server
, so we serialize it toout/mill-selective-execution.json
The core logic is shared between the two workflows above, as is most of the plumbing, although they hook into very different parts of the Mill codebase
Covered by some basic integration tests for the various code paths above, and one example test using 3
JavaModule
sLimitations
Currently doesn't work with
-w show
, due to howshow
has two nested evaluations that are hard to keep track of which one should be selective and which one shouldn't. This is part of the general problem discussed in #502 and I think we can punt on a solution for nowCannot yet be used on the com-lihaoyi/mill repo, due to the build graph unnecessarily plumbing
millVersion
everywhere which invalidates everything when the git dirty sha changes. Will clean that up in a follow up