-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache distribution package downloads with BuildKit cache mounts #224
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting feature. With this feature turned on, all builds on the same machine will use the cached folders /var/cache/apt
and /var/lib/apt
. That is all versions of Debian and Ubuntu will share the same folder.
In /var/cache/apt
, there are two database files: srcpkgcache.bin
and pkgcache.bin
, as well as the archives
directory holding the actual .deb
files. Different distributions/releases use different versions of apt
which silently deletes the database files and replaces them with their own. Furthermore, /var/lib/apt
seems to be invalidated on a version change and is also silently wiped. Moving forward or backwards in apt
versions seemed to work.
/var/cache/apt/archives
holds the actual .deb
files and so after a few runs with different distributions there are a range of files present:
/var/cache/apt/archives/perl_5.28.1-6+deb10u1_amd64.deb
/var/cache/apt/archives/perl_5.32.1-4+deb11u4_amd64.deb
/var/cache/apt/archives/perl_5.36.0-7+deb12u1_amd64.deb
/var/cache/apt/archives/perl_5.38.2-3.2build2_amd64.deb
/var/cache/apt/archives/perl_5.38.2-5_amd64.deb
apt
shows that it is using the cached files as 0 B are needed from the archives. e.g.
#12 2.586 1 upgraded, 44 newly installed, 0 to remove and 2 not upgraded.
#12 2.586 Need to get 0 B/59.8 MB of archives.
As you note, the concern in ocluster
would be that parallel steps builds are blocked. In my testing, only the build step that uses the cache is blocked so this seems less of a concern.
Through the cache hint, ocluster scheduler sends the same jobs to the same machines so the cache should be hit relatively frequently.
When the worker runs low on disk space, it runs docker system prune -af
which will wipe the cache.
The Docker base image builds happen on a 7-day cycle and pull 100MB (typical) from the archives. There are 64 builds which would use the cache (across different architectures, releases and distributions)
It's odd that there's no finer granularity than the build instance to share a cache…
Thanks for the analysis! I don't understand well if this means there will be too many "conflicts", somehow, and that would cancel the benefits of this shared cache?
Does that mean we'll save up to May I mark this PR ready for review and let you merge the changes if you're convinced it's an improvement? If so, I may come back later and implement the same feature for other package managers too. |
I still think that it's a good idea to cache apt packages, but for a base image, it's probably not great to change this setting without reverting it:
|
ec3cdcf
to
64748b0
Compare
64748b0
to
71b186f
Compare
I think sharing the package cache could make docker builds more efficient, but I'm also worried that parallel jobs could compete for the cache as it is exclusive (
locked
). Alternatively, it could be madeprivate
(creates a new mount if there are multiple writers), seeRUN --mount=type=cache
.The code comes from an example in the Docker docs Example: cache apt packages: