Dockerfile: Image Size, Squash and Multi Stage

How does docker build an image?

Docker builds images from a Dockerfile using layers. Layers are images in and of themselves. To build an image, docker basically creates a container of the previous layer, changes it according to the Dockerfile, commits the changes as an image, then removes the container. We see the process happening when we run docker build:

docker build --tag tests:1 .
Sending build context to Docker daemon  31.74kB
Step 1/3 : FROM ubuntu:latest
latest: Pulling from library/ubuntu
125a6e411906: Pull complete 
Digest: sha256:26c68657ccce2cb0a31b330cb0be2b5e108d467f641c62e13ab40cbec258c68d
Status: Downloaded newer image for ubuntu:latest
 ---> d2e4e1f51132
####################### LOOK HERE
Step 2/3 : RUN touch file1.txt 
 ---> Running in 07a500278997
Removing intermediate container 07a500278997
 ---> 6a3161b8a2e5
#######################
Step 3/3 : RUN touch file2.txt
 ---> Running in 5b7db2da2f3f
Removing intermediate container 5b7db2da2f3f
 ---> 7b5f29ba7408
Successfully built 7b5f29ba7408
Successfully tagged tests:1

The second step runs in the container 07a500278997. When the change is implemented, it is committed as the image 6a3161b8a2e5, and the container 07a500278997 can be removed. The third step then can begin implementing its changes from 6a3161b8a2e5.

One thing to notice is that only RUN, COPY and ADD commands add layers, other Dockerfile commands do not add them.

Why multiple layers is bad for size

Multiple layers will increase make builds faster since it allows for caching. However, it is bad for size. This is because now to download the full image you need to download the intermediate images.

Since you only store the changes in each stage, having multiple RUNs isn't necessarily bad (e.g. a RUN echo hello layer won't do anything since there are no changes) (also stated here). However, having several layers that affect the file system can lead to unnecessary artifacts in the image.

For example, using the following Dockerfile:

FROM ubuntu:latest

RUN touch file.txt

RUN rm file.txt

After building I expect to not see file.txt, which is what we see in the final image:

$ docker run -it germanrodriguez/tests:latest bash
root@fe2a6643eaa1:/# ls
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

However, we also keep the intermediate layers. In fact, I can open a container with it and see file.txt inside:

$ docker history germanrodriguez/tests:latest 
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
445b16781a04   28 seconds ago   /bin/sh -c rm file.txt                          0B        
c46dda03e5d3   29 seconds ago   /bin/sh -c touch file.txt                       0B        
d2e4e1f51132   2 weeks ago      /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      2 weeks ago      /bin/sh -c #(nop) ADD file:37744639836b248c8…   77.8MB    

$ docker run -it c46dda03e5d3 bash
root@1aaed50d9817:/# ls
bin  boot  dev  etc  file.txt  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

You can imagine that if file.txt weighted several MB this would be very bad for image size.

(Side note: this does not work with images pulled from Dockerhub, but this is because Docker marks pulled layers as read-only. The layers and their content still exist and were part of the pull).

How to optimize image size

We see, then, that to optimize image size the trick is to combine commands that affect the same files in a single layer.

To test that RUN commands affecting different files don't increase size we can compare this dockerfile:

FROM ubuntu:latest

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y git

RUN git clone https://github.com/robotology/yarp 

RUN git clone https://github.com/robotology/whole-body-controllers/

With this one:

FROM ubuntu:latest

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y git

RUN git clone https://github.com/robotology/yarp && git clone https://github.com/robotology/whole-body-controllers/

We see that, since the RUN commands affect different files it makes no difference to put them together or separately:

$ docker image ls | grep germanrodriguez
germanrodriguez/tests   one-run-different-directories    e4a334d97311   33 seconds ago   365MB
germanrodriguez/tests   two-runs-different-directories   96c52f5ee1ad   4 minutes ago    365MB

--squash

Combining all commands that will affect the same files in a single layer is easier said than done. For one, we may not always be sure what files will be affected by which command. Also, this can lead to huge and hard-to-maintain commands, that will also make poor use of Docker cache (and thus make builds slower).

One solution is the --squash option, an experimental feature of docker. As explained here, what --squash does is to create a new image where it loads all the diffs into a single layer. Since all the diffs are bundled together, it is effectively the same as having all the layers in a single command in your Dockerfile.

Thus, the problem of removing a file but still downloading it from the previous layer is not present anymore. There is only one layer and the intermediate file is not part of that layer.

Going back to this Dockerfile:

FROM ubuntu:latest

RUN touch file.txt

RUN rm file.txt

Building it with squash leads to the following:

$ docker build --tag germanrodriguez/tests:latest . --squash
Sending build context to Docker daemon  31.74kB
Step 1/3 : FROM ubuntu:latest
latest: Pulling from library/ubuntu
125a6e411906: Pull complete 
Digest: sha256:26c68657ccce2cb0a31b330cb0be2b5e108d467f641c62e13ab40cbec258c68d
Status: Downloaded newer image for ubuntu:latest
 ---> d2e4e1f51132
Step 2/3 : RUN touch file.txt
 ---> Running in edc9bc11b5b3
Removing intermediate container edc9bc11b5b3
 ---> ae86d98db813
Step 3/3 : RUN rm file.txt
 ---> Running in c5cbfe707dd2
Removing intermediate container c5cbfe707dd2
 ---> 7347650735ef
Successfully built b8d7a436cac7
Successfully tagged germanrodriguez/tests:latest

$ docker history germanrodriguez/tests:latest 
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
b8d7a436cac7   7 seconds ago                                                   0B        merge sha256:7347650735ef293acc678a51761e46636dfbbad6432ca3047c33ca318c9487aa to sha256:d2e4e1f511320dfb2d0baff2468fcf0526998b73fe10c8890b4684bb7ef8290f
<missing>      7 seconds ago   /bin/sh -c rm file.txt                          0B        
<missing>      8 seconds ago   /bin/sh -c touch file.txt                       0B        
<missing>      2 weeks ago     /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      2 weeks ago     /bin/sh -c #(nop) ADD file:37744639836b248c8…   77.8MB 

$ docker image ls 
REPOSITORY              TAG       IMAGE ID       CREATED          SIZE
germanrodriguez/tests   latest    b8d7a436cac7   22 seconds ago   77.8MB
<none>                  <none>    7347650735ef   22 seconds ago   77.8MB
ubuntu                  latest    d2e4e1f51132   2 weeks ago      77.8MB

We see that there is the base image, an image tagged none, and my image. Furthermore, my docker history shows <missing> intermediate images. This is because it used intermediate images to create image 7347650735ef, and then used this to create germanrodriguez/tests:latest (see here). I don't have an intermediate image I could open to access file.txt.

While squashing is good in theory, there are several drawbacks:

It is an experimental feature. That means that it must be enabled in the Docker daemon, it is considered unstable, and it can be removed at any time by Docker without warning. The creators of Docker seem to want to remove the feature, since they prefer multi-stage. So far they have been stopped by pushback from the community.
The computer that builds the image will need more storage since now it needs two copies of each image to build: one before squashing and one already squashed.
It is not necessarily faster to download, since you can't use parallel download if there is only one layer.
It can fail in some circumstances (but none that directly affects our builds as far as I'm aware).

Multi-Stage builds

Multi stage builds were added to Docker as a solution for reducing image size. When you use multiple FROM commands in the Dockerfile, each FROM creates a new stage. You can copy contents from one artifact to the other. In the end, Docker will only save the layers of the last stage.

This means that we can create the first layers with less concern about optimizing, since in the end only what we copy to the last layer will be preserved. The final result is an easier to read Dockerfile that doesn't compromise size with unnecessary bloat in the layers.

If we build the following Dockerfile:

FROM ubuntu:latest AS firststage

RUN touch file1.txt

RUN touch file2.txt

FROM ubuntu:latest

COPY --from=firststage /file1.txt /file1.txt

We get the following result:

$ docker build --tag germanrodriguez/tests:latest .
Sending build context to Docker daemon  43.52kB
Step 1/5 : FROM ubuntu:latest AS firststage
latest: Pulling from library/ubuntu
125a6e411906: Pull complete 
Digest: sha256:26c68657ccce2cb0a31b330cb0be2b5e108d467f641c62e13ab40cbec258c68d
Status: Downloaded newer image for ubuntu:latest
 ---> d2e4e1f51132
Step 2/5 : RUN touch file1.txt
 ---> Running in f857b14d00b0
Removing intermediate container f857b14d00b0
 ---> f4bf39ad7dc0
Step 3/5 : RUN touch file2.txt
 ---> Running in c58cc88589e9
Removing intermediate container c58cc88589e9
 ---> e9b4d96bf806
Step 4/5 : FROM ubuntu:latest
 ---> d2e4e1f51132
Step 5/5 : COPY --from=firststage /file1.txt /file1.txt
 ---> f3f137a8c79e
Successfully built f3f137a8c79e
Successfully tagged germanrodriguez/tests:latest

$ docker history germanrodriguez/tests:latest 
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
f3f137a8c79e   10 seconds ago   /bin/sh -c #(nop) COPY file:68eda30bce83c134…   0B        
d2e4e1f51132   2 weeks ago      /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      2 weeks ago      /bin/sh -c #(nop) ADD file:37744639836b248c8…   77.8MB  

$ docker run -it germanrodriguez/tests:latest bash
root@2bedbab5ef3b:/# ls
bin  boot  dev  etc  file1.txt  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

We see the docker history only shows 2 layers: d2e4e1f51132, which is the base ubuntu image (as shown in the first step of the docker build), and f3f137a8c79e which copies file1.txt. The layer with file2.txt (which is e9b4d96bf806 as seen during the build stage) shows as <missing> and is, thus, discarded. There is no file2.txt in the final image.

This means that we can build in the earlier stages however we want and only worry about optimizing the last stage. If we are only copying in the last stage, the main concern is not overlapping too many files when doing the COPY commands.

Docker developers seem to favor multi-stage vs squash, since it covers most same use-cases.

The only problem is when you have to copy multiple files in several places, which is exactly our use case. If we copy a directory that is already in the base image, we will have the repeated files twice in our image. (Actually, this behavior happens only with some drivers, others not, but it's unreliable).

Therefore, we need to make sure we only copy what is strictly needed (especially when they are new files since they don't generate duplicates). If the COPY commands refer to different directories, we won't need --squash anymore.

There is a proposal to skip repeated files when copying from one stage to the next. It would be wise to keep an eye on when or if this feature is implemented, since it would allow us to build Dockerfiles like so:

FROM $START_IMG as builder
RUN ...
# build everything without thinking of optimization
RUN ...
FROM $START_IMG as final_image
COPY --from builder --skip-base / / # copy only the new files to a new base image

We would have only one layer that adds only the new files to the start image.

TL;DR

Only RUN, COPY and ADD commands add layers. Things like ARG and ENV don't affect image size
Adding more layers is bad only when they refer to the same files, since the file will be present and duplicated in each layer. More layers per se is not necessarily a bad thing.
--squash is useful for addressing this problem, however it's unstable and prone to be removed without warning by Docker developers.
Multi-stage builds are a useful approach, but we need to still be aware of not copying too many repeated files to the last stage.
We should keep an eye on whether --skip-base will be implemented in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly