Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Reproducible portable out folder contents #4065

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Dec 3, 2024

No description provided.

@sequencer
Copy link
Contributor

We also notice some non-reproducible from the couriser downloading. Do you think it's reasonable to cache those artifacts and put into out directory?

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 3, 2024

@sequencer should be possible

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 3, 2024

One big blocker here is use of os.Path#toString. upickle.default.Writer[os.Path] and Writer[PathRef] can be made to be workspace-root-aware, which is what I have done in this PR so far, but toString is very common e.g. in passing -Xplugin:${jarPath.toString} to scalacOptions. os.Paths being absolute means different people with different /User/me /User/you folders will have different strings in these config values which will point at non-existent files if naively re-used on each others computers.

Furthermore, we cannot naively convert /User/me/project/out/blah into /$WORKSPACE/out/blah, since the output of os.Path#toString is usually passed to third-party applications that are outside of our control and will not understand these Mill-specific magic prefixes. In fact the whole approach of having separate /$WORKSPACE/ and /$HOME/ prefixes as implemented in the first draft of this PR doesn't really work for this reason

To solve this we probably need to instrument OS-Lib to give us a bit more control over this, e.g. making os.Path#toString return relative path string for any path within mill.api.WorkspaceRoot.workspaceRoot. These relative paths should be understandable by external tools like the Scala compiler in scalacOptions. Because these relative paths can only be relative to the workspaceRoot, we will need to make sure all such "special" paths are referenced through the workspace root. This can be done via symlinks, e.g. out/mill-home -> /User/lihaoyiwould allow us to reference stuff in the~/folder viaout/mill-home/...`

Alternatively, we could move off os.Path onto our own mill.api.Path data type, but that would be a lot more invasive a migration than tweaking the toString and constructor for os.Path via hooks exposed upstream in OS-Lib

@lolgab
Copy link
Member

lolgab commented Dec 3, 2024

I don't know if that's already possible or not, but an already great starting point would be to be able to recover from a cache where absolute paths haven't changed.
For example, on GitHub Actions, the home directory doesn't change.
It would be nice if there was a standard way to cache everything Mill produces so, after recovering from cache you can run something like mill __.testCached and no test is being executed if no code changed.

@gamlerhart
Copy link
Contributor

For my understanding. The Path.toString is a problem if its serialized as plain strings and user defined case classes/JSON?
Because then Mill doesn't notice that this is a path?

Or is there other context where the Path.toString becomes an issue?

@lolgab
Copy link
Member

lolgab commented Dec 3, 2024

@gamlerhart For example if scalacOptions requires an absolute path, then changing the workdir base path effectively invalidates all compilation cache since compile depends on scalacOptions

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 3, 2024

@lolgab if you do not change the workspaceRoot path then Mill out folders should already be reusable today. Mill's own CI builds the out folder once then shares it between all the downstream CI jobs running tests

@jodersky
Copy link
Member

jodersky commented Dec 4, 2024

Does this supersede #3765 ?

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 4, 2024

@jodersky it probably does

@gamlerhart
Copy link
Contributor

gamlerhart commented Dec 5, 2024

Silly ideas dump: These ideas are not usable, but maybe inspires better ideas.

  • String-Replace/Regex 'paths' from the stored task json: Search trough the final JSON and replace things that look like an absolute path. Do the reverse in parsing. Silly, because:

    • Gateway to horrible bugs / complete mysteries and debug nightmares.
    • May completely break some task because something got wrongly replaced
    • Extra performance overhead for every task.
  • Path.toString returns a 'virtual' path. Basic idea: we replace any such path with a made up Mill controls.
    Example: toString returns /tmp/mill/{magic-sha-or-uuid-representing-this-build}/{magic-sha-represeting-the-file-or-dir}
    These magic paths then need to be 'restored' by Mill. Silly, because:

    • Mill would need to restore / wrangle these paths. When would that happen?
    • The mapping would need to be maintained.
      - That state would need to be stored / managed? (or maybe somehow the build-task can be turned into an idenfier?)
      - The virtual paths need to be created. (eg. OS links, or even fuse). How would that work between OSes? (OK, maybe only share between same OS family?)
    • Path.toString would need interception: toString is now a complex operation. But .toString is also used by debugger,println etc.
    • Such a path might be created for use outside mill: Now what happens to that? Or, it may stay valid until the file is gone (ex using OS link)
    • Complex.
    • How are these paths cleaned up?
  • Chroot everything: Unify the paths by chrooting everything
    Silly because:

    • Does that work on Windows/MacOS?
    • Root permissions required, afaik.
    • Probably hard to understand in debugging / if you need an ad-hoc path for some reason.
  • Fuse everything: All paths created go via some Fuse virtual FS that go through mill
    Silly because:

    • Similar issues as Chroot/Path to string virtualization
    • Fuse involved.

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 5, 2024

Bazel actually does go with chroot/fuse. It can work, just needs to be done separately per operating system and is a lot of work overall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants