-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the Dockerfile + integrate qlever
script
#1439
Conversation
The old Dockerfile called `ServerMain` directly using a small selection of environment variables with outdated names. It was also outdated in other respects. The new Dockerfile installs the `qlever` script, so that it can be called from inside the container. Remaining questions / TODOs, feedback welcome: 1. Right now, the script is installed as part of the docker build via `pipx install qlever`. Is this the right way to do it? Alternatives would be to clone the GitHub repo and `pipx install -e .` from there, or include the the GitHub repo as a submodule of this repository. 2. How do we handle the QLever UI. We could just call `qlever ui` from inside the container. But that would pull the Docker image for the Qlever UI and run a Docker container inside of a Docker container. It's possible, but not the right way to do it. If both are needed, the container for the QLever engine and the contaner for the QLever UI should run side by side. 3. The`qlever setup-config` command should have options that overwrite the variables in the produced Qleverfile. In particular, there should be an option for setting `SYSTEM = native`. Otherwise it has to be stated explictly for each command, where that is relevant (in particular: `qlever index`, `qlever start`, `qlever example-queries`).
@ludovicm67 Can you try this and let me know your feedback? It addresses some of the issues that we have discussed. In particular, you can now use the If I understood you correctly, you would also like to run the QLever UI. This would also work here, but running a Docker container (for the QLever UI) inside of a Docker container is generally not a good idea. I don't have a solution yet, but here are some ideas:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1439 +/- ##
==========================================
- Coverage 89.57% 89.55% -0.02%
==========================================
Files 381 381
Lines 36792 36792
Branches 4170 4170
==========================================
- Hits 32955 32950 -5
- Misses 2525 2529 +4
- Partials 1312 1313 +1 ☔ View full report in Codecov by Sentry. |
@hannahbast Thanks for the updates!
I think it's great like this.
I think it's fine to run the QLever UI in a separate container.
Yes, I'm currently experimenting with environment variables and a simple shell script that generates the Qleverfile from them. |
Yes, I will give it a try soon and let you know my feedback.
Fully agree with you. It is better to have the QLever UI as a separate container.
I would not recommend this approach. It may lead to security issues, and I don't think it is a good practice.
This is a good approach. It's also the easiest way to deploy the full stack (backend and UI) for users who want to use both ; a simple
The issue with this is that you will run multiple processes at the same time (the backend and the UI) in a single container which is not recommended. In the future, the best approach would be to have two separate containers: one for the backend and one for the UI. And two variants: a minimal one, and a complete one, which will results in the following images:
That way, For people that want to have autoconfiguration, work with a Qleverfile, or have a shell to debug things in an easier way, they can use the complete images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the build of the image, the following warnings are shown:
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 1)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 9)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 21)
I did some suggestions in order to fix the issue mentioned in the warning message.
Example of logs where the warning is visible: https://github.com/ad-freiburg/qlever/actions/runs/10345045337/job/28631489324#step:8:175
@ludovicm67 Thank you for your comments. I change all the
|
Perfect 👍
I'm not sure to see the problem here, as it's what is expected, no?
In both ; one will call the The other one will call As in both containers, the Qleverfile will be generated on the fly using the environment variables (or can be mounted directly). The You can additionally release a |
@ludovicm67 Just for clarification: Item 1 in my previous comment was not a problem (on the contrary), but just the first item on the list leading up to the problem description. Let me think about what you wrote and then come back to you. Having the |
@hannahbast basically what we need to have for the UI container image:
By default, it ships with a basic I suggested using the But I remain completely open for any other solution. |
@ludovicm67 Thanks a lot and the exchange is much appreciated. I will think about it and come back to you. |
Dockerfile
Outdated
ENV CACHE_MAX_NUM_ENTRIES 1000 | ||
# Need the shell to get the INDEX_PREFIX environment variable | ||
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"] | ||
ENTRYPOINT ["bash"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using the default entrypoint from the Ubuntu Docker image (preferred):
ENTRYPOINT ["bash"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or explicitly set it to an empty one:
ENTRYPOINT ["bash"] | |
ENTRYPOINT [""] |
Dockerfile
Outdated
USER qlever | ||
ENV PATH=/app/:$PATH | ||
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that might be good to do is to lock a specific version of the qlever
CLI, to avoid unwanted breaking changes if the image gets rebuilt.
First define an ARG
like this at the top of the Dockerfile (so that it's easy to know what to change in case of upgrade):
ARG QLEVER_VERSION="0.5.3"
Then you might need to tell the layer to use the QLEVER_VERSION
arg, by just adding:
ARG QLEVER_VERSION
in the layer.
And then, you can use it that way, if I'm not mistaking on how to specify a package version with pipx (I guess it should behave like pip):
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever | |
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install "qlever==${QLEVER_VERSION}" |
You can take a look here for an example: https://github.com/zazukoians/qlever-tests/blob/76ada0b53174beb79d11ed14662536689a165fff/docker/server.Dockerfile
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image can be optimized for a smaller size. The 2 suggested changes are the low-hanging fruit and reduce the compressed image size by 40MB/15% (uncompressed: 100MB/12.5%). There is probably potential for further space savings. Further investigation is required to evaluate the cost/benefit of these.
It should be 24.04
in the PR title instead of 22.04.
.
Dockerfile
Outdated
ENV LC_CTYPE C.UTF-8 | ||
ENV LANG=C.UTF-8 | ||
ENV LC_ALL=C.UTF-8 | ||
ENV LC_CTYPE=C.UTF-8 | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer required. The used boost packages are from the official package repositories now.
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest |
Dockerfile
Outdated
ENV DEBIAN_FRONTEND=noninteractive | ||
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev | ||
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove apt
cache.
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion | |
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion && rm -rf /var/lib/apt/lists/* |
These two environment variables are useful when running the qlever CLI within a Docker container, introduced by ad-freiburg/qlever#1439 The effect of QLEVER_OVERRIDE_SYSTEM_NATIVE is that after `qlever setup-config <config name>`, the Qleverfile will contain the setting `SYSTEM = native` and not `SYSTEM = docker`. The effect of QLEVER_OVERRIDE_DISABLE_UI is that `qlever ui` is disabled, which makes sense when running inside a container. In both cases, meaningful log message are provided for the user.
With ad-freiburg/qlever#1439, the recommended way to run QLever inside a container is to use the https://github.com/ad-freiburg/qlever-control script. Inside of the container, the environment variable `QLEVER_IS_RUNNING_IN_CONTAINER` is set, with the following two effects: 1. After `qlever setup-config <config name>`, the Qleverfile will contain the setting `SYSTEM = native` and not `SYSTEM = docker` and a warning message will be displayed that that is the case 2. When running `qlever ui`, an error message will be displayed that this command is disabled (reason: `qlever ui` runs another container, and we don't want containers inside of containers).
It was surprisingly hard to get it to work well in both settings, with reasonable user rights and all.
TODO: Check whether all parts of the Dockerfile are still needs and move `.bashrc` for user `qlever` to own file `docker-bashrc`.
@Qup42 and @ludovicm67 Can you please have another look? I refactored and cleaned this up a lot now. More than I thought I needed too, but I am quite happy with the result. Please read the comments in the Dockerfile and in the |
qlever
script
They are needed for the `docker publish` action
@Qup42 and @ludovicm67 Please note that this has consequence for how the
It looks much cleaner to me anyway. What do you think? |
@hannahbast Thanks for all your updates. Here are some comments that came in my mind while reviewing this PR. This pull request is doing many things:
I think that in the future we should try to split big changes in smaller PRs (like one for the two first points and another one for the third one), in order to simplify the review process. About the upgrade of the base image and the harmonization of the syntax, everything looks good to me. About the entrypoint, I'm not sure if it is the best way to do it. As @Qup42 pointed in his comment, removing the cache would help to reduce the image size, which is pretty big right now.
For this, I just set I agree that inside the Docker image, the default value should be set to |
@ludovicm67 Thanks for your latest comment, I missed it and read it only now. Here are some comments and questions:
The alternative is to run the container with |
@hannahbast Thank you for your answer!
|
@ludovicm67 Thanks for the quick reply. I still don't understand the root issue. In my understanding, things like rootless Docker refer to whether you can run Docker containers also as an unprivileged user. But even then, you can be root inside of the container. It's just that root inside of the container is mapped to an unprivileged user outside of the container, so that you can't do root things outside of the container. Whereas with standard Docker, if you are root inside of the container and you have mounted volumes outside of the container, you can do root things also outside of the container. |
Maybe this can help you to understand the case with OpenShift: https://developers.redhat.com/blog/2020/10/26/adapting-docker-and-kubernetes-containers-to-run-on-red-hat-openshift-container-platform But your entrypoint script is great when used with the CLI. |
@ludovicm67 @Qup42 Thanks for all the discussions and feedback. I have now made final amendments, fixed the old Dockerfiles, and written a proper description above (which will become the commit message in the end). Please feel free to have another look and/or comment. Once the tests all run through, I intend to finally merge this. |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much. Feel free to add any additional comments that you have in mind, and then merge it.
Co-authored-by: Ludovic Muller <[email protected]>
Co-authored-by: Ludovic Muller <[email protected]>
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
@ludovicm67 @Qup42 @joka921 Just merged this + thanks for your help + for the protocol: I tried to replace the |
Opened #1658 for optimizing the size of the container image. |
Update the main Dockerfile, as well as the Dockerfiles based on older versions of Ubuntu in multiple ways:
The main Dockerfile is now based on Ubuntu 24.04 and has been refactored, cleaned up, and properly documented. It installs the
qlever
command-line tool, so that it can be used inside the container (where it will use the precompiled binaries, thanks to the environment variableQLEVER_IS_RUNNING_IN_CONTAINER
). The Dockerfile has its own entrypoint script, which tells the user how to run the Docker container. When run appropriately, theqlever
user in the container has the same user and group id as the user outside the container. That way, files written by the container have a proper user and group name both inside and outside the container, and there are no unexpected issues with access rights. The container can also still be run with-u $(id -u):$(id -g)
like before and works with Docker and Podman. The image size has been reduced significantly and is now below 1 GB on an Ubuntu 24.04.The old Dockerfiles for Ubuntu 22.04 and 20.04 are now marked as deprecated. Their sole purpose is to document how to install QLever on these systems. The Dockerfile for Ubuntu 18.04 has been deleted because
cmake
is not supported there anymore.