-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor the Dockerfile + integrate
qlever
script (#1439)
Update the main Dockerfile, as well as the Dockerfiles based on older versions of Ubuntu in multiple ways: 1. The main Dockerfile is now based on Ubuntu 24.04 and has been refactored, cleaned up, and properly documented. It installs the `qlever` command-line tool, so that it can be used inside the container (where it will use the precompiled binaries, thanks to the environment variable `QLEVER_IS_RUNNING_IN_CONTAINER`). The Dockerfile has its own entrypoint script, which tells the user how to run the Docker container. When run appropriately, the `qlever` user in the container has the same user and group id as the user outside the container. That way, files written by the container have a proper user and group name both inside and outside the container, and there are no unexpected issues with access rights. The container can also still be run with `-u $(id -u):$(id -g)` like before and works with Docker and Podman. The image size has been reduced significantly and is now below 1 GB on an Ubuntu 24.04. 2. The old Dockerfiles for Ubuntu 22.04 and 20.04 are now marked as deprecated. Their sole purpose is to document how to install QLever on these systems. The Dockerfile for Ubuntu 18.04 has been deleted because `cmake` is not supported there anymore.
- Loading branch information
1 parent
44e2ba8
commit 392b0e3
Showing
6 changed files
with
190 additions
and
131 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,9 @@ | ||
* | ||
!.clang-format | ||
!CMakeLists.txt | ||
!CompilationInfo.cmake | ||
!.git | ||
!LICENSE | ||
!README.md | ||
!e2e | ||
!evaluation | ||
!index | ||
!misc | ||
!src | ||
!test | ||
!third_party | ||
!wikidata_settings.json | ||
!e2e | ||
!benchmark | ||
!.git | ||
!CMakeLists.txt | ||
!CompilationInfo.cmake | ||
!docker-entrypoint.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,57 +1,80 @@ | ||
FROM ubuntu:22.04 as base | ||
LABEL maintainer="Johannes Kalmbach <[email protected]>" | ||
ENV LANG C.UTF-8 | ||
ENV LC_ALL C.UTF-8 | ||
ENV LC_CTYPE C.UTF-8 | ||
FROM ubuntu:24.04 AS base | ||
LABEL maintainer="Hannah Bast <[email protected]>" | ||
|
||
# Packages needed for both both building and running the binaries. | ||
ENV LANG=C.UTF-8 | ||
ENV LC_ALL=C.UTF-8 | ||
ENV LC_CTYPE=C.UTF-8 | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update && apt-get install -y software-properties-common wget && add-apt-repository -y ppa:mhier/libboost-latest | ||
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh &&./kitware-archive.sh | ||
|
||
FROM base as builder | ||
# Install the packages needed for building the binaries (this is a separate | ||
# stage to keep the final image small). | ||
FROM base AS builder | ||
ARG TARGETPLATFORM | ||
RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev | ||
COPY . /app/ | ||
|
||
WORKDIR /app/ | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh && ./kitware-archive.sh | ||
RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev | ||
|
||
# Copy everything we need to build the binaries. | ||
# | ||
# NOTE: We are deliberately explicit here, for two reasons. First, so that we | ||
# don't copy more than necessary without having to rely on `.dockerignore`. | ||
# Second, so that we can copy `docker-entrypoint.sh` separately below (we don't | ||
# to rebuild the whole container when making a small change in there). | ||
COPY src /qlever/src/ | ||
COPY test /qlever/test/ | ||
COPY e2e /qlever/e2e/ | ||
COPY benchmark /qlever/benchmark/ | ||
COPY .git /qlever/.git/ | ||
COPY CMakeLists.txt /qlever/ | ||
COPY CompilationInfo.cmake /qlever/ | ||
|
||
WORKDIR /app/build/ | ||
# Don't build and run tests on ARM64, as it takes too long on GitHub actions. | ||
# TODO: re-enable these tests as soon as we can use a native ARM64 platform to compile the Docker container. | ||
WORKDIR /qlever/build/ | ||
RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=INFO -DUSE_PARALLEL=true -D_NO_TIMING_TESTS=ON -GNinja .. | ||
# When cross-compiling the container for ARM64, then compiling and running all tests runs into a timeout on GitHub actions, | ||
# so we disable tests for this platform. | ||
# TODO(joka921) re-enable these tests as soon as we can use a native ARM64 platform to compile the docker container. | ||
RUN if [ $TARGETPLATFORM = "linux/arm64" ] ; then echo "target is ARM64, don't build tests to avoid timeout"; fi | ||
RUN if [ $TARGETPLATFORM = "linux/arm64" ] ; then cmake --build . --target IndexBuilderMain ServerMain; else cmake --build . ; fi | ||
RUN if [ $TARGETPLATFORM = "linux/arm64" ] ; then echo "Skipping tests for ARM64" ; else ctest --rerun-failed --output-on-failure ; fi | ||
|
||
FROM base as runtime | ||
WORKDIR /app | ||
# Install the packages needed for the final image. | ||
FROM base AS runtime | ||
WORKDIR /qlever | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev | ||
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion vim sudo && rm -rf /var/lib/apt/lists/* | ||
|
||
ARG UID=1000 | ||
RUN groupadd -r qlever && useradd --no-log-init -r -u $UID -g qlever qlever && chown qlever:qlever /app | ||
USER qlever | ||
ENV PATH=/app/:$PATH | ||
# Set up user `qlever` with temporary sudo rights (which will be removed again | ||
# by the `docker-entrypoint.sh` script, see there). | ||
RUN groupadd -r qlever && useradd --no-log-init -d /qlever -r -g qlever qlever | ||
RUN chown qlever:qlever /qlever | ||
RUN echo "qlever ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers | ||
|
||
COPY --from=builder /app/build/*Main /app/ | ||
COPY --from=builder /app/e2e/* /app/e2e/ | ||
ENV PATH=/app/:$PATH | ||
# Set up a profile script that is sourced whenever a new login shell is | ||
# started (by any user, hence in `/etc/profile.d/`). For some reason, podman | ||
# wants it in `/qlever/.bashrc`, so we also copy it there. | ||
ENV QLEVER_PROFILE=/etc/profile.d/qlever.sh | ||
RUN echo "eval \"\$(register-python-argcomplete qlever)\"" >> $QLEVER_PROFILE | ||
RUN echo "export QLEVER_ARGCOMPLETE_ENABLED=1 && export QLEVER_IS_RUNNING_IN_CONTAINER=1" >> $QLEVER_PROFILE | ||
RUN echo "PATH=/qlever:/qlever/.local/bin:$PATH && PS1=\"\u@docker:\W\$ \"" >> $QLEVER_PROFILE | ||
RUN echo 'alias ll="ls -l"' >> $QLEVER_PROFILE | ||
RUN echo "if [ -d /data ]; then cd /data; fi" >> $QLEVER_PROFILE | ||
RUN cp $QLEVER_PROFILE /qlever/.bashrc | ||
|
||
# Install the `qlever` command line tool. We have to set the two environment | ||
# variables again here because in batch mode, the profile script above is not | ||
# sourced. The `PATH` is set again to avoid a warning from `pipx`. | ||
USER qlever | ||
EXPOSE 7001 | ||
VOLUME ["/input", "/index"] | ||
|
||
ENV INDEX_PREFIX index | ||
ENV MEMORY_FOR_QUERIES 70 | ||
ENV CACHE_MAX_SIZE_GB 30 | ||
ENV CACHE_MAX_SIZE_GB_SINGLE_ENTRY 5 | ||
ENV CACHE_MAX_NUM_ENTRIES 1000 | ||
# Need the shell to get the INDEX_PREFIX environment variable | ||
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"] | ||
|
||
# Build image: docker build -t qlever.master . | ||
ENV PATH=/qlever:/qlever/.local/bin:$PATH | ||
RUN pipx install qlever | ||
ENV QLEVER_ARGCOMPLETE_ENABLED=1 | ||
ENV QLEVER_IS_RUNNING_IN_CONTAINER=1 | ||
|
||
# Build index: DB=wikidata; docker run -it --rm -v "$(pwd)":/index --entrypoint bash --name qlever.$DB-index qlever.master -c "IndexBuilderMain -f /index/$DB.nt -i /index/$DB -s /index/$DB.settings.json | tee /index/$DB.index-log.txt"; rm -f $DB/*tmp* | ||
# Copy the binaries and the entrypoint script. | ||
COPY --from=builder /qlever/build/*Main /qlever/ | ||
COPY --from=builder /qlever/e2e/* /qlever/e2e/ | ||
COPY docker-entrypoint.sh /qlever/ | ||
RUN sudo chmod +x /qlever/docker-entrypoint.sh | ||
|
||
# Run engine: DB=wikidata; PORT=7001; docker rm -f qlever.$DB; docker run -d --restart=unless-stopped -v "$(pwd)":/index -p $PORT:7001 -e INDEX_PREFIX=$DB -e MEMORY_FOR_QUERIES=30 --name qlever.$DB qlever.master; docker logs -f --tail=100 qlever.$DB | ||
# Our entrypoint script does some clever things; see the comments in there. | ||
ENTRYPOINT ["/qlever/docker-entrypoint.sh"] |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,22 @@ | ||
# This Dockerfile is DEPRECATED, use the latest Dockerfile from the repository. | ||
# | ||
# The only reason it is here is to document how to install QLever on Ubuntu 20.04. | ||
|
||
FROM ubuntu:20.04 as base | ||
LABEL maintainer="Johannes Kalmbach <[email protected]>" | ||
ENV LANG C.UTF-8 | ||
ENV LC_ALL C.UTF-8 | ||
ENV LC_CTYPE C.UTF-8 | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update | ||
RUN apt install -y software-properties-common | ||
RUN apt install -y software-properties-common wget | ||
RUN add-apt-repository -y ppa:mhier/libboost-latest | ||
RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test | ||
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh &&./kitware-archive.sh | ||
RUN apt-get update | ||
|
||
FROM base as builder | ||
RUN apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git gcc-11 g++-11 libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-url1.81-dev | ||
RUN apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git gcc-11 g++-11 libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-url1.81-dev libboost-iostreams1.81-dev libboost-program-options1.81 | ||
|
||
COPY . /app/ | ||
|
||
|
@@ -25,7 +30,7 @@ RUN make test | |
FROM base as runtime | ||
WORKDIR /app | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime lbzip2 libjemalloc-dev libzstd-dev libssl-dev | ||
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81 libstdc++6 | ||
|
||
ARG UID=1000 | ||
RUN groupadd -r qlever && useradd --no-log-init -r -u $UID -g qlever qlever && chown qlever:qlever /app | ||
|
@@ -34,22 +39,5 @@ ENV PATH=/app/:$PATH | |
|
||
COPY --from=builder /app/build/*Main /app/ | ||
COPY --from=builder /app/e2e/* /app/e2e/ | ||
ENV PATH=/app/:$PATH | ||
|
||
USER qlever | ||
EXPOSE 7001 | ||
VOLUME ["/input", "/index"] | ||
|
||
ENV INDEX_PREFIX index | ||
ENV MEMORY_FOR_QUERIES 70 | ||
ENV CACHE_MAX_SIZE_GB 30 | ||
ENV CACHE_MAX_SIZE_GB_SINGLE_ENTRY 5 | ||
ENV CACHE_MAX_NUM_ENTRIES 1000 | ||
# Need the shell to get the INDEX_PREFIX environment variable | ||
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"] | ||
|
||
# Build image: docker build -t qlever.master . | ||
|
||
# Build index: DB=wikidata; docker run -it --rm -v "$(pwd)":/index --entrypoint bash --name qlever.$DB-index qlever.master -c "IndexBuilderMain -f /index/$DB.nt -i /index/$DB -s /index/$DB.settings.json | tee /index/$DB.index-log.txt"; rm -f $DB/*tmp* | ||
|
||
# Run engine: DB=wikidata; PORT=7001; docker rm -f qlever.$DB; docker run -d --restart=unless-stopped -v "$(pwd)":/index -p $PORT:7001 -e INDEX_PREFIX=$DB -e MEMORY_FOR_QUERIES=30 --name qlever.$DB qlever.master; docker logs -f --tail=100 qlever.$DB | ||
ENTRYPOINT ["bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# This Dockerfile is DEPRECATED, use the latest Dockerfile from the repository. | ||
# | ||
# The only reason it is here is to document how to install QLever on Ubuntu 22.04. | ||
|
||
FROM ubuntu:22.04 as base | ||
LABEL maintainer="Johannes Kalmbach <[email protected]>" | ||
ENV LANG C.UTF-8 | ||
ENV LC_ALL C.UTF-8 | ||
ENV LC_CTYPE C.UTF-8 | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update && apt-get install -y software-properties-common wget && add-apt-repository -y ppa:mhier/libboost-latest | ||
RUN wget https://apt.kitware.com/kitware-archive.sh && chmod +x kitware-archive.sh &&./kitware-archive.sh | ||
|
||
FROM base as builder | ||
RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev | ||
|
||
COPY . /app/ | ||
|
||
WORKDIR /app/ | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
WORKDIR /app/build/ | ||
RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=INFO -DUSE_PARALLEL=true -D_NO_TIMING_TESTS=ON -GNinja .. && ninja | ||
RUN ctest --rerun-failed --output-on-failure | ||
|
||
FROM base as runtime | ||
WORKDIR /app | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev | ||
|
||
ARG UID=1000 | ||
RUN groupadd -r qlever && useradd --no-log-init -r -u $UID -g qlever qlever && chown qlever:qlever /app | ||
USER qlever | ||
ENV PATH=/app/:$PATH | ||
|
||
COPY --from=builder /app/build/*Main /app/ | ||
COPY --from=builder /app/e2e/* /app/e2e/ | ||
|
||
ENTRYPOINT ["bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
#!/bin/bash | ||
|
||
# Entrypoint script for the QLever docker image. It sets the UID and GID of the | ||
# user `qlever` inside the container to the UID and GID specified in the call of | ||
# `docker run`, typically the UID and GID of the user outside the container. | ||
# That way, we don't need to set special permissions for the mounted volume, | ||
# and everything looks nice inside of the container, too. | ||
# | ||
# NOTE: The container should be started with `-e UID=... -e GID=...` and not | ||
# `-u ...:...` for the following reason. In order to change the UID and GID of | ||
# the internal user `qlever`, we need `sudo` rights, which are granted to that | ||
# user via the configuration in the Dockerfile. However, if we run the container | ||
# with `-u ...:...`, the user changes and no longer has `sudo` rights. | ||
|
||
# Help message that is printed if the container is not startes as recommended. | ||
HELP_MESSAGE=' | ||
The recommended way to run a container with this image is as follows. Run in a fresh directory. Add `-p <outside port>:<inside port>` if you want to expose ports. Inside the container, the `qlever` command-line tool is available, as well as the QLever binaries (which you need not call directly, they are called by the various `qlever` commands). | ||
In batch mode (user `qlever` inside the container, with the same UID and GID as outside): | ||
\x1b[34mdocker run -it --rm -e UID=$(id -u) -e GID=$(id -g) -v $(pwd):/data -w /data qlever -c "qlever setup-config olympics && qlever get-data && qlever index"\x1b[0m | ||
The same, but in interactive mode: | ||
\x1b[34mdocker run -it --rm -e UID=$(id -u) -e GID=$(id -g) -v $(pwd):/data -w /data qlever\x1b[0m | ||
It also works with `-u $(id -u):$(id -g)` (but then the user inside the container has no proper name): | ||
\x1b[34mdocker run -it --rm -u $(id -u):$(id -g) -v $(pwd):/data -w /data qlever\x1b[0m | ||
\x1b[34mdocker run -it --rm -u $(id -u):$(id -g) -v $(pwd):/data -w /data qlever -c "..."\x1b[0m | ||
With podman you should use `-u $(id -u):$(id -g)` together with `--userns=keep-id`: | ||
\x1b[34mpodman run -it --rm -u $(id -u):$(id -g) --userns=keep-id -v $(pwd):/data -w /data qlever\x1b[0m | ||
\x1b[34mpodman run -it --rm -u $(id -u):$(id -g) --userns=keep-id -v $(pwd):/data -w /data qlever -c "..."\x1b[0m | ||
' | ||
|
||
# If the container is run without `-v ...:/data -w /data` (in particular, without | ||
# any arguments), show the help message and exit. | ||
if [ "$(pwd)" != "/data" ]; then | ||
echo | ||
echo -e "\x1b[34mWELCOME TO THE QLEVER DOCKER IMAGE\x1b[0m" | ||
echo -e "$HELP_MESSAGE" | ||
exit 1 | ||
fi | ||
|
||
# If the container is run with arguments, but the first argument is not `-c`, | ||
# prepend `-c` to the arguments (so that the user can omit the `-c`). | ||
if [[ $# -gt 0 && "$1" != "-c" ]]; then | ||
set -- -c "$@" | ||
fi | ||
|
||
# If the user is not `qlever`, start a new login shell (to make sure that the | ||
# profile script from the Dockerfile is executed). | ||
# specified. | ||
if ! whoami > /dev/null || [ "$(whoami)" != "qlever" ]; then | ||
exec bash --login "$@" | ||
fi | ||
|
||
# With `-e UID=... -e GID=...`, change the UID and GID of the user `qlever` inside | ||
# the container accordingly. | ||
# | ||
# NOTE: The call `su - qlever ...` has to be inside of the `sudo` call, because | ||
# once the UID and GID of the user `qlever` have been changed, it no longer has | ||
# `sudo` rights. And just remaining in the shell or starting a new shell (with | ||
# `bash`) does not work, because neither of these would have the new UID and GID. | ||
if [ $# -eq 0 ]; then | ||
echo | ||
sudo -E bash -c "usermod -u $UID -s /bin/bash qlever && groupmod -g $GID qlever && chown -R qlever:qlever /qlever && su - qlever --login" | ||
else | ||
if [ "$1" == "-c" ]; then | ||
shift | ||
fi | ||
sudo -E bash -c "usermod -u $UID -s /bin/bash qlever && groupmod -g $GID qlever && chown -R qlever:qlever /qlever && su - qlever -s /bin/bash --login -c \"$@\"" | ||
fi |