-
-
Notifications
You must be signed in to change notification settings - Fork 320
CacheDir Improvements
Mats Wichmann edited this page Oct 14, 2024
·
14 revisions
(As of 7 Oct 2024)
Here are some collected ideas on improving the utility of the SCons derived-file cache. It's a grab-bag: nothing should be inferred from the current ordering - neither priority, usefulness, ease of implementation, or any other metric (TODO).
- Cache management support: SCons has no cache size limit or file count limit, and does not prune the cache. Limits and/or pruning can be internal or external. If SCons itself implemented a (settable) size limit, it would have to implement a policy for when to evict old entries (e.g. LRU). A separate cache management utility would leave the SCons code simpler and might be desirable for performance reasons as well. One user with size problems contributed a trick to this wiki that adds some support for size limiting: LimitCacheSizeWithProgress. Limits and pruning become more interesting with a remote cache (see elsewhere).
- Cache validation:in the past, users have complained about cache entries being corrupted, usually by running out of space during a write (so in a sense related to limits/pruning). It was speculated on, but without recorded confirmation, that there may still be concurrent-write problems in shared caches. Validate independent of SCons is tricky - while the content signature of the cached file was known at creation time, it is not recorded, and so not easily available, as it's the build signature that is used as the cache key. A proposal in issue 1908 suggested checking current files in the build tree against their cached equivalents which has a decent chance of success in a single-user setup, but not otherwise. That idea didn't get much traction at the time (aka the proposed patch was never merged) but it wasn't rejected, either.
- Automatic caching: would it be desirable to enable caching by default, with an option to disable entirely or selectively? Various other systems do this (ccache, bazel), others like SCons are opt-in. It might be a surprise that builds cause disk space consumption, but it might produce better performance out of the box.
- Set the cache directory from the command line. See issue 3618
- Produce cache logs / stats in a machine-parsable format. See issue 3696. For illustration,
ccache
has these command-line options:--show-stats
to produce a readable statistics output,--print-stats
to produce a predictable TSV format. - Enhanced/persistent cache statistics: cache requests and hits are collected for the duration of an SCons run, and emitted only if
--cache-debug
is requested. Cache information could be stored in the cache to allow examination of longer-term cache statistics, plus some more general information could be shown like cache disk/file size, ages, etc. - Cache compression: should there be an option to compress cache entries?
- Remote cache storage. See Issue 4394. This is a highly desired topic. As developers mostly work on individual machines, sharing cache across a team needs a network mode. This comes with a number of considerations, and will likely add several command-line options. "Use NFS" is an approach that is known to work, but not everyone has a setup (note: Amazon EFS is an NFS implementation). A few bullet points, see discussion below for more details.
- Cache policy: how a local and remote cache interact needs to be defined. There could be a settable policy (SCons uses a "policy" approach for options like
--duplicate
and--decider
), or the policy can be derived from the combination of flags. This needs to address: is this run L-only, R-only or L+R; if L+R, which to prefer; does L-miss + R-hit cause a write to L? - Remote backends: a proposed implementation PR 3971 is written to a specific back-end style, that used by bazel-remote, which is a WebDAV style and uses two storage buckets (named
/ac
and/cas
), because bazel also caches its idea of Actions. A more general approach would be to allow any key-value store like Redis/Valkey, Memcached, various cloud options, etc. - Remote configuration: there should be a way to set attributes for use of the remote, such as considering it read-only. Should account for local settings and possibly remote. For example, the SCons invocation could decide to treat the remote as read-only, or the remote could decide that it's read-only to us. Writing to the cache is usually an error you ignore (does not affect the actual build), but we don't want to waste resources by constantly trying and ignoring the failures for hundreds of build products, if we could know we can't write to it.
- Network retrieval is "slow" in computer terms, SCons currently does cache retrieval in a blocking fashion, but is not good for a remote cache. How should the asynchronous fetching of (possibly) network-cached files be implemented so as not to consume threads normally used for building.
- Invocation: adding more cache-related command-line options means more typing (or more configuring in wrappers, IDEs, etc.). The suspicion is developers will use the cache the same way (almost) every time, so there should probably be a way to make the settings more permanent. At the very least, they need to be settable via SetOption.
- Cleanup: SCons doesn't currently prune a local cache (that's a top-level bullet on this page). It doesn't make much sense for a local SCons to try to manage a remote storage set up to presumably serve many users.
- Cache policy: how a local and remote cache interact needs to be defined. There could be a settable policy (SCons uses a "policy" approach for options like
Existing saved discussions to be pasted