Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize cached-build codepaths #4026

Open
lihaoyi opened this issue Nov 25, 2024 · 3 comments
Open

Optimize cached-build codepaths #4026

lihaoyi opened this issue Nov 25, 2024 · 3 comments
Milestone

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Nov 25, 2024

Running time ./mill __.compile on the Mill codebase, after everything is already compiled, takes about ~2.5 to run and do nothing. ./mill plan __.compile shows about 16847 tasks, working out to be able 0.14ms to evaluate each cached task.

Mill's codebase is unusual in having a large number of modules for its size, as it uses modules to define the various integration test cases. But this large number of modules is perhaps representative of large projects in general, and we should definitely be able to have better performance than this to support large projects.

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 26, 2024

Some rough profiles showing the hotspots in JProfiler. Seems a lot of time spend in revalidateIfNeededOrThrow and in resolveDirectChildren (which uses Class#getMethods and NameTransformer#decode

Screenshot 2024-11-26 at 9 36 33 AM

Untitled.jps.zip

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 26, 2024

Seems the time is relatively evenly split between resolve, instantiateModule, and evaluate

Screenshot 2024-11-26 at 9 43 54 AM

Optimizing resolve would likely involve moving the getMethods/decode calls into the Discover data structure, which would be a binary compatibility breakage and need to wait for 0.13.0

@lihaoyi lihaoyi added this to the 0.13.0 milestone Nov 27, 2024
lihaoyi added a commit that referenced this issue Dec 14, 2024
Fixes #4129 and
#4026. We never really looked
at performance at this stuff before, but in order to support
`selective.prepare __` (the default of `selective.prepare`) we need to
improve these bottlenecks

* Introduces the `ResolveCore#Cache` class that holds mutable
dictionaries to memo-ize various expensive parts of the task resolution
logic

* Lots of micro-optimizations in various places: cache
`Segments#hashCode`, avoid calling `Segments#render` for sorting or
`Segments#part` for taking the last part, etc.

Brings down `time ./mill plan __` from ~60s to ~3s
@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 15, 2024

Leaving this open because while #4132 helps, we still need to move the name decoding logic to compile time in the Discover macro to help speed up these workflows further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant