Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache distribution package downloads with BuildKit cache mounts #224

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MisterDA
Copy link
Contributor

@MisterDA MisterDA commented Nov 6, 2024

I think sharing the package cache could make docker builds more efficient, but I'm also worried that parallel jobs could compete for the cache as it is exclusive (locked). Alternatively, it could be made private (creates a new mount if there are multiple writers), see RUN --mount=type=cache.

The code comes from an example in the Docker docs Example: cache apt packages:

# syntax=docker/dockerfile:1
FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
  --mount=type=cache,target=/var/lib/apt,sharing=locked \
  apt update && apt-get --no-install-recommends install -y gcc

Apt needs exclusive access to its data, so the caches use the option sharing=locked, which will make sure multiple parallel builds using the same cache mount will wait for each other and not access the same cache files at the same time. You could also use sharing=private if you prefer to have each build create another cache directory in this case.

@MisterDA MisterDA requested a review from mtelvers November 6, 2024 12:38
Copy link
Member

@mtelvers mtelvers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting feature. With this feature turned on, all builds on the same machine will use the cached folders /var/cache/apt and /var/lib/apt. That is all versions of Debian and Ubuntu will share the same folder.

In /var/cache/apt, there are two database files: srcpkgcache.bin and pkgcache.bin, as well as the archives directory holding the actual .deb files. Different distributions/releases use different versions of apt which silently deletes the database files and replaces them with their own. Furthermore, /var/lib/apt seems to be invalidated on a version change and is also silently wiped. Moving forward or backwards in apt versions seemed to work.

/var/cache/apt/archives holds the actual .deb files and so after a few runs with different distributions there are a range of files present:

/var/cache/apt/archives/perl_5.28.1-6+deb10u1_amd64.deb
/var/cache/apt/archives/perl_5.32.1-4+deb11u4_amd64.deb
/var/cache/apt/archives/perl_5.36.0-7+deb12u1_amd64.deb
/var/cache/apt/archives/perl_5.38.2-3.2build2_amd64.deb
/var/cache/apt/archives/perl_5.38.2-5_amd64.deb

apt shows that it is using the cached files as 0 B are needed from the archives. e.g.

#12 2.586 1 upgraded, 44 newly installed, 0 to remove and 2 not upgraded.
#12 2.586 Need to get 0 B/59.8 MB of archives.

As you note, the concern in ocluster would be that parallel steps builds are blocked. In my testing, only the build step that uses the cache is blocked so this seems less of a concern.

Through the cache hint, ocluster scheduler sends the same jobs to the same machines so the cache should be hit relatively frequently.

When the worker runs low on disk space, it runs docker system prune -af which will wipe the cache.

The Docker base image builds happen on a 7-day cycle and pull 100MB (typical) from the archives. There are 64 builds which would use the cache (across different architectures, releases and distributions)

@MisterDA
Copy link
Contributor Author

This is an interesting feature. With this feature turned on, all builds on the same machine will use the cached folders /var/cache/apt and /var/lib/apt. That is all versions of Debian and Ubuntu will share the same folder.

It's odd that there's no finer granularity than the build instance to share a cache…

In /var/cache/apt, there are two database files: srcpkgcache.bin and pkgcache.bin, as well as the archives directory holding the actual .deb files. Different distributions/releases use different versions of apt which silently deletes the database files and replaces them with their own. Furthermore, /var/lib/apt seems to be invalidated on a version change and is also silently wiped. Moving forward or backwards in apt versions seemed to work.

Thanks for the analysis! I don't understand well if this means there will be too many "conflicts", somehow, and that would cancel the benefits of this shared cache?

The Docker base image builds happen on a 7-day cycle and pull 100MB (typical) from the archives. There are 64 builds which would use the cache (across different architectures, releases and distributions)

Does that mean we'll save up to $100\text{MB} * 64 \approx 6 \text{GB}$ per week?

May I mark this PR ready for review and let you merge the changes if you're convinced it's an improvement? If so, I may come back later and implement the same feature for other package managers too.

@MisterDA MisterDA marked this pull request as ready for review November 12, 2024 07:51
@MisterDA
Copy link
Contributor Author

I still think that it's a good idea to cache apt packages, but for a base image, it's probably not great to change this setting without reverting it:

RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache

@MisterDA MisterDA changed the title Cache Apt package downloads with BuildKit cache mounts Cache distribution package downloads with BuildKit cache mounts Dec 12, 2024
@MisterDA MisterDA marked this pull request as draft December 12, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants