Skip to content

Commit

Permalink
Refactor the Dockerfile + integrate qlever script (#1439)
Browse files Browse the repository at this point in the history
Update the main Dockerfile, as well as the Dockerfiles based on older versions of Ubuntu in multiple ways:

1. The main Dockerfile is now based on Ubuntu 24.04 and has been refactored, cleaned up, and properly documented. It installs the `qlever` command-line tool, so that it can be used inside the container (where it will use the precompiled binaries, thanks to the environment variable `QLEVER_IS_RUNNING_IN_CONTAINER`). The Dockerfile has its own entrypoint script, which tells the user how to run the Docker container. When run appropriately, the `qlever` user in the container has the same user and group id as the user outside the container. That way, files written by the container have a proper user and group name both inside and outside the container, and there are no unexpected issues with access rights. The container can also still be run with `-u $(id -u):$(id -g)` like before and works with Docker and Podman. The image size has been reduced significantly and is now below 1 GB on an Ubuntu 24.04.

2. The old Dockerfiles for Ubuntu 22.04 and 20.04 are now marked as deprecated. Their sole purpose is to document how to install QLever on these systems. The Dockerfile for Ubuntu 18.04 has been deleted because `cmake` is not supported there anymore.
  • Loading branch information
hannahbast authored Dec 4, 2024
1 parent 44e2ba8 commit 392b0e3
Show file tree
Hide file tree
Showing 6 changed files with 190 additions and 131 deletions.
17 changes: 5 additions & 12 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,16 +1,9 @@
*
!.clang-format
!CMakeLists.txt
!CompilationInfo.cmake
!.git
!LICENSE
!README.md
!e2e
!evaluation
!index
!misc
!src
!test
!third_party
!wikidata_settings.json
!e2e
!benchmark
!.git
!CMakeLists.txt
!CompilationInfo.cmake
!docker-entrypoint.sh
101 changes: 62 additions & 39 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,57 +1,80 @@
FROM ubuntu:22.04 as base
LABEL maintainer="Johannes Kalmbach <[email protected]>"
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV LC_CTYPE C.UTF-8
FROM ubuntu:24.04 AS base
LABEL maintainer="Hannah Bast <[email protected]>"

# Packages needed for both both building and running the binaries.
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8
ENV LC_CTYPE=C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y software-properties-common wget && add-apt-repository -y ppa:mhier/libboost-latest
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh &&./kitware-archive.sh

FROM base as builder
# Install the packages needed for building the binaries (this is a separate
# stage to keep the final image small).
FROM base AS builder
ARG TARGETPLATFORM
RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev
COPY . /app/

WORKDIR /app/
ENV DEBIAN_FRONTEND=noninteractive
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh && ./kitware-archive.sh
RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev

# Copy everything we need to build the binaries.
#
# NOTE: We are deliberately explicit here, for two reasons. First, so that we
# don't copy more than necessary without having to rely on `.dockerignore`.
# Second, so that we can copy `docker-entrypoint.sh` separately below (we don't
# to rebuild the whole container when making a small change in there).
COPY src /qlever/src/
COPY test /qlever/test/
COPY e2e /qlever/e2e/
COPY benchmark /qlever/benchmark/
COPY .git /qlever/.git/
COPY CMakeLists.txt /qlever/
COPY CompilationInfo.cmake /qlever/

WORKDIR /app/build/
# Don't build and run tests on ARM64, as it takes too long on GitHub actions.
# TODO: re-enable these tests as soon as we can use a native ARM64 platform to compile the Docker container.
WORKDIR /qlever/build/
RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=INFO -DUSE_PARALLEL=true -D_NO_TIMING_TESTS=ON -GNinja ..
# When cross-compiling the container for ARM64, then compiling and running all tests runs into a timeout on GitHub actions,
# so we disable tests for this platform.
# TODO(joka921) re-enable these tests as soon as we can use a native ARM64 platform to compile the docker container.
RUN if [ $TARGETPLATFORM = "linux/arm64" ] ; then echo "target is ARM64, don't build tests to avoid timeout"; fi
RUN if [ $TARGETPLATFORM = "linux/arm64" ] ; then cmake --build . --target IndexBuilderMain ServerMain; else cmake --build . ; fi
RUN if [ $TARGETPLATFORM = "linux/arm64" ] ; then echo "Skipping tests for ARM64" ; else ctest --rerun-failed --output-on-failure ; fi

FROM base as runtime
WORKDIR /app
# Install the packages needed for the final image.
FROM base AS runtime
WORKDIR /qlever
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion vim sudo && rm -rf /var/lib/apt/lists/*

ARG UID=1000
RUN groupadd -r qlever && useradd --no-log-init -r -u $UID -g qlever qlever && chown qlever:qlever /app
USER qlever
ENV PATH=/app/:$PATH
# Set up user `qlever` with temporary sudo rights (which will be removed again
# by the `docker-entrypoint.sh` script, see there).
RUN groupadd -r qlever && useradd --no-log-init -d /qlever -r -g qlever qlever
RUN chown qlever:qlever /qlever
RUN echo "qlever ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

COPY --from=builder /app/build/*Main /app/
COPY --from=builder /app/e2e/* /app/e2e/
ENV PATH=/app/:$PATH
# Set up a profile script that is sourced whenever a new login shell is
# started (by any user, hence in `/etc/profile.d/`). For some reason, podman
# wants it in `/qlever/.bashrc`, so we also copy it there.
ENV QLEVER_PROFILE=/etc/profile.d/qlever.sh
RUN echo "eval \"\$(register-python-argcomplete qlever)\"" >> $QLEVER_PROFILE
RUN echo "export QLEVER_ARGCOMPLETE_ENABLED=1 && export QLEVER_IS_RUNNING_IN_CONTAINER=1" >> $QLEVER_PROFILE
RUN echo "PATH=/qlever:/qlever/.local/bin:$PATH && PS1=\"\u@docker:\W\$ \"" >> $QLEVER_PROFILE
RUN echo 'alias ll="ls -l"' >> $QLEVER_PROFILE
RUN echo "if [ -d /data ]; then cd /data; fi" >> $QLEVER_PROFILE
RUN cp $QLEVER_PROFILE /qlever/.bashrc

# Install the `qlever` command line tool. We have to set the two environment
# variables again here because in batch mode, the profile script above is not
# sourced. The `PATH` is set again to avoid a warning from `pipx`.
USER qlever
EXPOSE 7001
VOLUME ["/input", "/index"]

ENV INDEX_PREFIX index
ENV MEMORY_FOR_QUERIES 70
ENV CACHE_MAX_SIZE_GB 30
ENV CACHE_MAX_SIZE_GB_SINGLE_ENTRY 5
ENV CACHE_MAX_NUM_ENTRIES 1000
# Need the shell to get the INDEX_PREFIX environment variable
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"]

# Build image: docker build -t qlever.master .
ENV PATH=/qlever:/qlever/.local/bin:$PATH
RUN pipx install qlever
ENV QLEVER_ARGCOMPLETE_ENABLED=1
ENV QLEVER_IS_RUNNING_IN_CONTAINER=1

# Build index: DB=wikidata; docker run -it --rm -v "$(pwd)":/index --entrypoint bash --name qlever.$DB-index qlever.master -c "IndexBuilderMain -f /index/$DB.nt -i /index/$DB -s /index/$DB.settings.json | tee /index/$DB.index-log.txt"; rm -f $DB/*tmp*
# Copy the binaries and the entrypoint script.
COPY --from=builder /qlever/build/*Main /qlever/
COPY --from=builder /qlever/e2e/* /qlever/e2e/
COPY docker-entrypoint.sh /qlever/
RUN sudo chmod +x /qlever/docker-entrypoint.sh

# Run engine: DB=wikidata; PORT=7001; docker rm -f qlever.$DB; docker run -d --restart=unless-stopped -v "$(pwd)":/index -p $PORT:7001 -e INDEX_PREFIX=$DB -e MEMORY_FOR_QUERIES=30 --name qlever.$DB qlever.master; docker logs -f --tail=100 qlever.$DB
# Our entrypoint script does some clever things; see the comments in there.
ENTRYPOINT ["/qlever/docker-entrypoint.sh"]
59 changes: 0 additions & 59 deletions Dockerfiles/Dockerfile.Ubuntu18.04

This file was deleted.

30 changes: 9 additions & 21 deletions Dockerfiles/Dockerfile.Ubuntu20.04
Original file line number Diff line number Diff line change
@@ -1,17 +1,22 @@
# This Dockerfile is DEPRECATED, use the latest Dockerfile from the repository.
#
# The only reason it is here is to document how to install QLever on Ubuntu 20.04.

FROM ubuntu:20.04 as base
LABEL maintainer="Johannes Kalmbach <[email protected]>"
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV LC_CTYPE C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt install -y software-properties-common
RUN apt install -y software-properties-common wget
RUN add-apt-repository -y ppa:mhier/libboost-latest
RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh &&./kitware-archive.sh
RUN apt-get update

FROM base as builder
RUN apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git gcc-11 g++-11 libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-url1.81-dev
RUN apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git gcc-11 g++-11 libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-url1.81-dev libboost-iostreams1.81-dev libboost-program-options1.81

COPY . /app/

Expand All @@ -25,7 +30,7 @@ RUN make test
FROM base as runtime
WORKDIR /app
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime lbzip2 libjemalloc-dev libzstd-dev libssl-dev
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81 libstdc++6

ARG UID=1000
RUN groupadd -r qlever && useradd --no-log-init -r -u $UID -g qlever qlever && chown qlever:qlever /app
Expand All @@ -34,22 +39,5 @@ ENV PATH=/app/:$PATH

COPY --from=builder /app/build/*Main /app/
COPY --from=builder /app/e2e/* /app/e2e/
ENV PATH=/app/:$PATH

USER qlever
EXPOSE 7001
VOLUME ["/input", "/index"]

ENV INDEX_PREFIX index
ENV MEMORY_FOR_QUERIES 70
ENV CACHE_MAX_SIZE_GB 30
ENV CACHE_MAX_SIZE_GB_SINGLE_ENTRY 5
ENV CACHE_MAX_NUM_ENTRIES 1000
# Need the shell to get the INDEX_PREFIX environment variable
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"]

# Build image: docker build -t qlever.master .

# Build index: DB=wikidata; docker run -it --rm -v "$(pwd)":/index --entrypoint bash --name qlever.$DB-index qlever.master -c "IndexBuilderMain -f /index/$DB.nt -i /index/$DB -s /index/$DB.settings.json | tee /index/$DB.index-log.txt"; rm -f $DB/*tmp*

# Run engine: DB=wikidata; PORT=7001; docker rm -f qlever.$DB; docker run -d --restart=unless-stopped -v "$(pwd)":/index -p $PORT:7001 -e INDEX_PREFIX=$DB -e MEMORY_FOR_QUERIES=30 --name qlever.$DB qlever.master; docker logs -f --tail=100 qlever.$DB
ENTRYPOINT ["bash"]
39 changes: 39 additions & 0 deletions Dockerfiles/Dockerfile.Ubuntu22.04
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# This Dockerfile is DEPRECATED, use the latest Dockerfile from the repository.
#
# The only reason it is here is to document how to install QLever on Ubuntu 22.04.

FROM ubuntu:22.04 as base
LABEL maintainer="Johannes Kalmbach <[email protected]>"
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV LC_CTYPE C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y software-properties-common wget && add-apt-repository -y ppa:mhier/libboost-latest
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh &&./kitware-archive.sh

FROM base as builder
RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev

COPY . /app/

WORKDIR /app/
ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /app/build/
RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=INFO -DUSE_PARALLEL=true -D_NO_TIMING_TESTS=ON -GNinja .. && ninja
RUN ctest --rerun-failed --output-on-failure

FROM base as runtime
WORKDIR /app
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev

ARG UID=1000
RUN groupadd -r qlever && useradd --no-log-init -r -u $UID -g qlever qlever && chown qlever:qlever /app
USER qlever
ENV PATH=/app/:$PATH

COPY --from=builder /app/build/*Main /app/
COPY --from=builder /app/e2e/* /app/e2e/

ENTRYPOINT ["bash"]
75 changes: 75 additions & 0 deletions docker-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#!/bin/bash

# Entrypoint script for the QLever docker image. It sets the UID and GID of the
# user `qlever` inside the container to the UID and GID specified in the call of
# `docker run`, typically the UID and GID of the user outside the container.
# That way, we don't need to set special permissions for the mounted volume,
# and everything looks nice inside of the container, too.
#
# NOTE: The container should be started with `-e UID=... -e GID=...` and not
# `-u ...:...` for the following reason. In order to change the UID and GID of
# the internal user `qlever`, we need `sudo` rights, which are granted to that
# user via the configuration in the Dockerfile. However, if we run the container
# with `-u ...:...`, the user changes and no longer has `sudo` rights.

# Help message that is printed if the container is not startes as recommended.
HELP_MESSAGE='
The recommended way to run a container with this image is as follows. Run in a fresh directory. Add `-p <outside port>:<inside port>` if you want to expose ports. Inside the container, the `qlever` command-line tool is available, as well as the QLever binaries (which you need not call directly, they are called by the various `qlever` commands).
In batch mode (user `qlever` inside the container, with the same UID and GID as outside):
\x1b[34mdocker run -it --rm -e UID=$(id -u) -e GID=$(id -g) -v $(pwd):/data -w /data qlever -c "qlever setup-config olympics && qlever get-data && qlever index"\x1b[0m
The same, but in interactive mode:
\x1b[34mdocker run -it --rm -e UID=$(id -u) -e GID=$(id -g) -v $(pwd):/data -w /data qlever\x1b[0m
It also works with `-u $(id -u):$(id -g)` (but then the user inside the container has no proper name):
\x1b[34mdocker run -it --rm -u $(id -u):$(id -g) -v $(pwd):/data -w /data qlever\x1b[0m
\x1b[34mdocker run -it --rm -u $(id -u):$(id -g) -v $(pwd):/data -w /data qlever -c "..."\x1b[0m
With podman you should use `-u $(id -u):$(id -g)` together with `--userns=keep-id`:
\x1b[34mpodman run -it --rm -u $(id -u):$(id -g) --userns=keep-id -v $(pwd):/data -w /data qlever\x1b[0m
\x1b[34mpodman run -it --rm -u $(id -u):$(id -g) --userns=keep-id -v $(pwd):/data -w /data qlever -c "..."\x1b[0m
'

# If the container is run without `-v ...:/data -w /data` (in particular, without
# any arguments), show the help message and exit.
if [ "$(pwd)" != "/data" ]; then
echo
echo -e "\x1b[34mWELCOME TO THE QLEVER DOCKER IMAGE\x1b[0m"
echo -e "$HELP_MESSAGE"
exit 1
fi

# If the container is run with arguments, but the first argument is not `-c`,
# prepend `-c` to the arguments (so that the user can omit the `-c`).
if [[ $# -gt 0 && "$1" != "-c" ]]; then
set -- -c "$@"
fi

# If the user is not `qlever`, start a new login shell (to make sure that the
# profile script from the Dockerfile is executed).
# specified.
if ! whoami > /dev/null || [ "$(whoami)" != "qlever" ]; then
exec bash --login "$@"
fi

# With `-e UID=... -e GID=...`, change the UID and GID of the user `qlever` inside
# the container accordingly.
#
# NOTE: The call `su - qlever ...` has to be inside of the `sudo` call, because
# once the UID and GID of the user `qlever` have been changed, it no longer has
# `sudo` rights. And just remaining in the shell or starting a new shell (with
# `bash`) does not work, because neither of these would have the new UID and GID.
if [ $# -eq 0 ]; then
echo
sudo -E bash -c "usermod -u $UID -s /bin/bash qlever && groupmod -g $GID qlever && chown -R qlever:qlever /qlever && su - qlever --login"
else
if [ "$1" == "-c" ]; then
shift
fi
sudo -E bash -c "usermod -u $UID -s /bin/bash qlever && groupmod -g $GID qlever && chown -R qlever:qlever /qlever && su - qlever -s /bin/bash --login -c \"$@\""
fi

0 comments on commit 392b0e3

Please sign in to comment.