Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Support Remote Execution API for caching #1520

Closed
rhuanbarreto opened this issue Jun 24, 2024 · 18 comments
Closed

[feature] Support Remote Execution API for caching #1520

rhuanbarreto opened this issue Jun 24, 2024 · 18 comments
Labels
enhancement New feature or request

Comments

@rhuanbarreto
Copy link

Is your feature request related to a problem? Please describe.

Although moonbase has a caching service, for regulatory reasons we cannot store cached artifacts outside our own domain.

Many other monorepo tools like bazel, pants and rush enables the usage of your own storage backend for caching artifacts.

On the other hand, caching the .moon/cache folder in github actions doesn't help much either once the size limits of github are too low.

Describe the solution you'd like

I would like to have a config so I can self host my own cached artifacts in Azure Blob Storage for example. If this includes running a container separately for the service like https://github.com/buchgr/bazel-remote it's fine.

Describe alternatives you've considered

For now using moonbase is actually hard as it creates a dependency a service outside our domain.
So only alternative is using Github / Azure DevOps pipeline caching.

@rhuanbarreto rhuanbarreto added the enhancement New feature or request label Jun 24, 2024
@milesj
Copy link
Collaborator

milesj commented Jun 24, 2024

I've been working on making moonbase self-hostable, but while doing so, I've had thoughts of just reworking it into a generic remote caching server. I keep going back and forth on which approach would be better. Either way, it's a lot for me to maintain at the moment.

@rhuanbarreto
Copy link
Author

No rush at all! Very important to have but also not the top priority right now.

One suggestion to cut some corners that don't need to be developed: You can leverage bazel-remote right away and avoid building the same abstraction again. Leverage it so you don't need to build something that almost became an industry standard. This will also put a big plus on the monorepo.tools website for moonrepo.

So the REAPI is a gRPC Protobuf implementation where the bazel-remote responds with the cache parts in a streaming way, which saves lots of back and forth.

One implementation in rust is done by Pants in this file: https://github.com/pantsbuild/pants/blob/main/src/rust/engine/process_execution/src/cache.rs

Hope you can find a way! It would be very beneficial to all the community.

@milesj
Copy link
Collaborator

milesj commented Jun 25, 2024

Yeah agreed, I've also thought about piggy backing off of bazel's APIs. Might as well.

@dudicoco
Copy link

Maybe the remote cache can be implemented on the client side instead of having to use a server?
That way the client can directly read/write the cache from blob storage.

@rhuanbarreto
Copy link
Author

By using bazel-remote we do this. But moon must support this as the source for finding the cache hits and hydrating the state.

@milesj
Copy link
Collaborator

milesj commented Aug 15, 2024

I've briefly looked into this, and I will be moving to bazel's APIs, since they also offer action caching which I'll need in the future. Just need to find the time to integrate it. If anyone else wants to tackle it, let me know.

@dudicoco
Copy link

By using bazel-remote we do this. But moon must support this as the source for finding the cache hits and hydrating the state.

Can you elaborate? Doesn't bazel-remote require a server?

@rhuanbarreto
Copy link
Author

Yes. We run a bazel-remote container backed by azure blob storage. We connect to bazel-remote using mTLS connection. We use this today with Pants. If moon could support the same, we don't need to have many different places for managing this cache.

@dudicoco
Copy link

@rhuanbarreto I still don't understand your point.

My suggestion was to have the client make direct API calls to the blob storage (S3 etc.) instead of communicating with a server which has to be deployed and maintained. In addition a server would require another authentication and authorization mechanism for the clients, which you would get out of the box with IAM permissions for a client based solution.

So I still don't see the advantage of having a server based solution which adds extra complexity and overhead.

@milesj
Copy link
Collaborator

milesj commented Sep 12, 2024

Good news, a new rust crate recently popped up that does a lot of the heavy lifting for the bazel remote APIs. https://github.com/amkartashov/bazel-remote-apis-rust

Will give this a shot for the next release.

@rhuanbarreto
Copy link
Author

OMG! Great news! If you need an alpha tester, you know where to find me.

One small request: Make sure moon can support mTLS connections. htppasswd is too unsafe.

@larsivi
Copy link

larsivi commented Oct 9, 2024

The issue linked from nx above is what took me here. I co-own a small startup that mostly runs in GCP. Having used nx for a while in our monorepo, I started to really like the plugins that provided caching via whatever, but in my case GCS buckets. The plugin basically uses the GCS API, and stores/fetches directly to/from the configured bucket. This works both from within GCP, and from dev machines given the proper credentials. My big beef with the changes nx are doing, is that they get paid plugins that do the exact same while blocking the open source ones. I don't necessarily mind using nx cloud or similar (moonbase), but I'd rather use infra we already pay for (and/or have a payment relationship with), rather than buying yet another service. (The new paid plugins doesn't support GCS either at this point.)

Setting up an additional VM for proxying sounds very unnecessary, unless it can provide some additional functionality.

Anyway, I hope this can be come to a useful resolution, as I am now considering the options that are not nx, and moonrepo looks very interesting.

@milesj
Copy link
Collaborator

milesj commented Nov 16, 2024

An update on this:

I've got a basic implementation working that communicates with https://github.com/buchgr/bazel-remote. PR here: #1651

Uploading to CAS was relatively easy.

However, downloading from CAS is currently blocked. The issue is that I don't know how to reference the cached item in CAS and download the correct blob. The bazel APIs require a digest (hash + size) but we only have rhe hash. We can't calculate the size without archiving the build before running the task, which is far too much overhead.

The bazel asset API actually solve this, as you can associate metadata with an uploaded blob via tagging, but bazel-remote does not support the asset API... https://github.com/buchgr/bazel-remote/blob/master/server/grpc_asset.go#L218

I don't think this will land in the next release, until I can figure out how to calculate these digests.

@milesj
Copy link
Collaborator

milesj commented Nov 16, 2024

I've been thinking about this even more, and I'm still quite confused.

I took a lot at pants, which uses these bazel APIs, and it looks like the scan the outputs on the file system, read the bytes and size of each file, and collect all digests for these files, then upload them all to remote cache as individual files.

In moon, we pack all the outputs into a single tarball archive, store that at .moon/cache/outputs, and upload the tarball to moonbase with the associated hash. This pattern doesn't look possible with bazel APIs, as we would need to generate the outputs and create the tarball, before the task has ran, which simply isn't possible (lol).

But even if we follow the pants/bazel way of doing things, it still doesn't make much sense. For example:

  • After a task is ran, we can scan all the outputs, create digests, and upload to remote cache. Super easy.
  • Before a task is ran, we need to check for a cache hit. But if there are no outputs that exist locally, we can't create digests, and download from the remote cache. How are we supposed to create these digests without knowing the actual size of the outputs? Which isn't possible without running the task?? And at that point it defeats having a cache since we're still doing the work???

@milesj
Copy link
Collaborator

milesj commented Nov 16, 2024

Ok, ok, I think I finally figured it all out, thanks to this article: https://bitrise.io/blog/post/bazel-remote-caching-api

I need to use the ActionResult as an intermediary cache, which then maps the outputs the to the task being ran. https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto#L1056

@milesj
Copy link
Collaborator

milesj commented Nov 26, 2024

This landed in v1.30. It still needs work, so please create a new issue for any problems/feedback.

@milesj milesj closed this as completed Nov 26, 2024
@rhuanbarreto
Copy link
Author

Does it work with certificates? Or only basic auth? Any docs available?

@milesj
Copy link
Collaborator

milesj commented Nov 26, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

4 participants