Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the Dockerfile + integrate qlever script #1439

Merged
merged 17 commits into from
Dec 4, 2024

Conversation

hannahbast
Copy link
Member

@hannahbast hannahbast commented Aug 12, 2024

Update the main Dockerfile, as well as the Dockerfiles based on older versions of Ubuntu in multiple ways:

  1. The main Dockerfile is now based on Ubuntu 24.04 and has been refactored, cleaned up, and properly documented. It installs the qlever command-line tool, so that it can be used inside the container (where it will use the precompiled binaries, thanks to the environment variable QLEVER_IS_RUNNING_IN_CONTAINER). The Dockerfile has its own entrypoint script, which tells the user how to run the Docker container. When run appropriately, the qlever user in the container has the same user and group id as the user outside the container. That way, files written by the container have a proper user and group name both inside and outside the container, and there are no unexpected issues with access rights. The container can also still be run with -u $(id -u):$(id -g) like before and works with Docker and Podman. The image size has been reduced significantly and is now below 1 GB on an Ubuntu 24.04.

  2. The old Dockerfiles for Ubuntu 22.04 and 20.04 are now marked as deprecated. Their sole purpose is to document how to install QLever on these systems. The Dockerfile for Ubuntu 18.04 has been deleted because cmake is not supported there anymore.

The old Dockerfile called `ServerMain` directly using a small selection
of environment variables with outdated names. It was also outdated in
other respects.

The new Dockerfile installs the `qlever` script, so that it can be
called from inside the container.

Remaining questions / TODOs, feedback welcome:

1. Right now, the script is installed as part of the docker build via
   `pipx install qlever`. Is this the right way to do it?
   Alternatives would be to clone the GitHub repo and `pipx install -e
   .` from there, or include the the GitHub repo as a submodule of this
   repository.

2. How do we handle the QLever UI. We could just call `qlever ui` from
   inside the container. But that would pull the Docker image for the
   Qlever UI and run a Docker container inside of a Docker container.
   It's possible, but not the right way to do it. If both are needed,
   the container for the QLever engine and the contaner for the QLever
   UI should run side by side.

3. The`qlever setup-config` command should have options that overwrite
   the variables in the produced Qleverfile. In particular, there should
   be an option for setting `SYSTEM = native`. Otherwise it has to be
   stated explictly for each command, where that is relevant (in
   particular: `qlever index`, `qlever start`, `qlever
   example-queries`).
@hannahbast
Copy link
Member Author

@ludovicm67 Can you try this and let me know your feedback? It addresses some of the issues that we have discussed. In particular, you can now use the qlever script from inside the container (with autocompletion). Or you can run the container directly with a sequence of qlever commands.

If I understood you correctly, you would also like to run the QLever UI. This would also work here, but running a Docker container (for the QLever UI) inside of a Docker container is generally not a good idea. I don't have a solution yet, but here are some ideas:

  1. Run the Docker container with -v /var/run/docker.sock:/var/run/docker.sock so that Docker commands inside of the container are using the Docker daemon outside of the container. I am not sure how portable this is and whether there are security concerns.

  2. Use docker-compose to start two containers, one for the QLever backend and one for the QLever UI. What I don't like about this approach is that it adds complexity that is not needed for users that want just the backend and not the UI. Then again, for other (but not all) triple stores, the UI is often considered an integral part of the software.

  3. Have a Python package also for the QLever UI. Then it could be installed with pix install qlever-ui, analogously to the pipx install qlever for the qlever script. It does make sense because the QLever UI is a Python application. I have to think about that a little more (and play around with it and see how it works).

Copy link

codecov bot commented Aug 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.55%. Comparing base (44e2ba8) to head (73ed78d).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1439      +/-   ##
==========================================
- Coverage   89.57%   89.55%   -0.02%     
==========================================
  Files         381      381              
  Lines       36792    36792              
  Branches     4170     4170              
==========================================
- Hits        32955    32950       -5     
- Misses       2525     2529       +4     
- Partials     1312     1313       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ludovicm67
Copy link

@hannahbast Thanks for the updates!

Remaining questions / TODOs, feedback welcome:

  1. Right now, the script is installed as part of the docker build via pipx install qlever. Is this the right way to do it? Alternatives would be to clone the GitHub repo and pipx install -e . from there, or include the the GitHub repo as a submodule of this repository.

I think it's great like this.
I would just recommend pinning a specific version of the qlever package, so that the build is reproducible and does not break when a new version is released.

  1. How do we handle the QLever UI. We could just call qlever ui from inside the container. But that would pull the Docker image for the Qlever UI and run a Docker container inside of a Docker container. It's possible, but not the right way to do it. If both are needed, the container for the QLever engine and the contaner for the QLever UI should run side by side.

I think it's fine to run the QLever UI in a separate container.
That way, the user can decide if they want to use the UI or not easily.
The user can also run the UI on a different machine than the engine, or one instance of the UI for multiple server instances, which can be useful in some cases.

  1. Theqlever setup-config command should have options that overwrite the variables in the produced Qleverfile. In particular, there should be an option for setting SYSTEM = native. Otherwise it has to be stated explictly for each command, where that is relevant (in particular: qlever index, qlever start, qlever example-queries).

Yes, I'm currently experimenting with environment variables and a simple shell script that generates the Qleverfile from them.
I will provide some feedback on this soon.

@ludovicm67
Copy link

ludovicm67 commented Aug 12, 2024

@hannahbast

@ludovicm67 Can you try this and let me know your feedback? It addresses some of the issues that we have discussed. In particular, you can now use the qlever script from inside the container (with autocompletion). Or you can run the container directly with a sequence of qlever commands.

Yes, I will give it a try soon and let you know my feedback.

If I understood you correctly, you would also like to run the QLever UI. This would also work here, but running a Docker container (for the QLever UI) inside of a Docker container is generally not a good idea.

Fully agree with you. It is better to have the QLever UI as a separate container.
Running a Docker container inside a Docker container should be avoided.

I don't have a solution yet, but here are some ideas:

  1. Run the Docker container with -v /var/run/docker.sock:/var/run/docker.sock so that Docker commands inside of the container are using the Docker daemon outside of the container. I am not sure how portable this is and whether there are security concerns.

I would not recommend this approach. It may lead to security issues, and I don't think it is a good practice.

  1. Use docker-compose to start two containers, one for the QLever backend and one for the QLever UI. What I don't like about this approach is that it adds complexity that is not needed for users that want just the backend and not the UI. Then again, for other (but not all) triple stores, the UI is often considered an integral part of the software.

This is a good approach.
It is better to have two separate containers for the backend and the UI.
It will make the deployment easier and more flexible.

It's also the easiest way to deploy the full stack (backend and UI) for users who want to use both ; a simple docker compose up will be enough.

  1. Have a Python package also for the QLever UI. Then it could be installed with pix install qlever-ui, analogously to the pipx install qlever for the qlever script. It does make sense because the QLever UI is a Python application. I have to think about that a little more (and play around with it and see how it works).

The issue with this is that you will run multiple processes at the same time (the backend and the UI) in a single container which is not recommended.
It will also increase the complexity of the container and the image size.

In the future, the best approach would be to have two separate containers: one for the backend and one for the UI.

And two variants: a minimal one, and a complete one, which will results in the following images:

  • qlever (server):
    • qlever:vX.Y.Z (minimal => only the server binary, no shell)
    • qlever:vX.Y.Z-cli (complete => the server binary, the qlever CLI, auto-generation of the Qleverfile and a shell)
  • qlever-ui (UI):
    • qlever-ui:vX.Y.Z (minimal => only the minimum to run the UI, no shell)
    • qlever-ui:vX.Y.Z-cli (complete => the content of the minimal version with the qlever-ui CLI, auto-generation of the Qleverfile and a shell)

That way, qlever-control can use the minimal images by default.
People that want to run binaries directly can also use that variant.

For people that want to have autoconfiguration, work with a Qleverfile, or have a shell to debug things in an easier way, they can use the complete images.

@hannahbast hannahbast requested a review from joka921 August 12, 2024 13:33
Copy link

@ludovicm67 ludovicm67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the build of the image, the following warnings are shown:

#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 1)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 9)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 21)

I did some suggestions in order to fix the issue mentioned in the warning message.

Example of logs where the warning is visible: https://github.com/ad-freiburg/qlever/actions/runs/10345045337/job/28631489324#step:8:175

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@hannahbast
Copy link
Member Author

@ludovicm67 Thank you for your comments. I change all the as to AS. And I agree that it makes sense to have separate containers for the QLever Engine and the QLever UI. But I see the following problem:

  1. Right now, leaving the QLever UI aside, everything is in one Docker container, including the compiled binaries and the qlever script. That way, one does not have to call the binaries directly (like before), but one can simply write something like qlever index from within the Docker container.

  2. With two different containers running in parallel (one for the QLever Engine and one for the QLever UI), where does the qlever script run? In one of the containers, with access to the other container? Or in a third container, with access to both containers? I think it's important that there is one container, where one can use the qlever script interactively if one wants to.

@ludovicm67
Copy link

@hannahbast

@ludovicm67 Thank you for your comments. I change all the as to AS.

Perfect 👍

And I agree that it makes sense to have separate containers for the QLever Engine and the QLever UI. But I see the following problem:

  1. Right now, leaving the QLever UI aside, everything is in one Docker container, including the compiled binaries and the qlever script. That way, one does not have to call the binaries directly (like before), but one can simply write something like qlever index from within the Docker container.

I'm not sure to see the problem here, as it's what is expected, no?

  1. With two different containers running in parallel (one for the QLever Engine and one for the QLever UI), where does the qlever script run? In one of the containers, with access to the other container? Or in a third container, with access to both containers? I think it's important that there is one container, where one can use the qlever script interactively if one wants to.

In both ; one will call the qlever get-data, qlever index and qlever start commands for the server part (probably with an option to skip the two first to avoid redownloading the data and building the index on every restart of the container if there is persistency).

The other one will call qlever ui for the UI part.

As in both containers, the Qleverfile will be generated on the fly using the environment variables (or can be mounted directly).

The qlever commands to debug the index, and so on would need to be run in the server container, as it is where the data is stored.

You can additionally release a qlever-control (or qlever-cli) container that will have the qlever script, but I will need to be able to get information from a remote server (as the data, the index, etc… are stored in the server container) so that it can be useful.
If so, it could also be used by people that doesn't want to/can't have a proper Python environment to run qlever.

@hannahbast
Copy link
Member Author

@ludovicm67 Just for clarification: Item 1 in my previous comment was not a problem (on the contrary), but just the first item on the list leading up to the problem description. Let me think about what you wrote and then come back to you. Having the qlever script in both containers looks wrong to me since it would also require to have the Qleverfile in both containers. I understand that it could be generated (by the same mechanism in both), but it still looks wrong. Mounting it is not always an option: if I understood you correctly, there are scenarios, where one wants everything inside of the Docker container(s).

@ludovicm67
Copy link

@hannahbast basically what we need to have for the UI container image:

  • Know the endpoint to target:
    • What is the hostname of the server?
    • What is the port of the server?
  • Display information about the database:
    • Get the database name
    • Get the description of the database
    • Maybe some useful other metadata?
  • Support for multiple endpoints (as a second step if not possible easily in a first version)
  • Manage users

By default, it ships with a basic test admin user if I remember correctly.
It could be great if we can disable/enable the admin part and/or dynamically configure the user.

I suggested using the qlever script, as for now it's already taking care about the endpoints to target and the database metadata, so that there is a single thing to maintain.
But I'm not opposed to having another way to have this ; the thing I would just expect is that it's possible to have this dynamically configured on the fly if needed.
As generating the Qleverfile dynamically would be done in the server part, reusing that work for the UI could make sense, to reduce the effort to get into something that works well and that can be easily maintained.

But I remain completely open for any other solution.
What I wrote are just simple ideas that could solve the base issues (having containers that could be easily deployed and configured) in order to open a discussion.

@hannahbast
Copy link
Member Author

@ludovicm67 Thanks a lot and the exchange is much appreciated. I will think about it and come back to you.

Dockerfile Outdated
ENV CACHE_MAX_NUM_ENTRIES 1000
# Need the shell to get the INDEX_PREFIX environment variable
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"]
ENTRYPOINT ["bash"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using the default entrypoint from the Ubuntu Docker image (preferred):

Suggested change
ENTRYPOINT ["bash"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or explicitly set it to an empty one:

Suggested change
ENTRYPOINT ["bash"]
ENTRYPOINT [""]

Dockerfile Outdated
USER qlever
ENV PATH=/app/:$PATH
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that might be good to do is to lock a specific version of the qlever CLI, to avoid unwanted breaking changes if the image gets rebuilt.

First define an ARG like this at the top of the Dockerfile (so that it's easy to know what to change in case of upgrade):

ARG QLEVER_VERSION="0.5.3"

Then you might need to tell the layer to use the QLEVER_VERSION arg, by just adding:

ARG QLEVER_VERSION

in the layer.

And then, you can use it that way, if I'm not mistaking on how to specify a package version with pipx (I guess it should behave like pip):

Suggested change
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install "qlever==${QLEVER_VERSION}"

You can take a look here for an example: https://github.com/zazukoians/qlever-tests/blob/76ada0b53174beb79d11ed14662536689a165fff/docker/server.Dockerfile

Copy link

sonarcloud bot commented Aug 15, 2024

Copy link
Member

@Qup42 Qup42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image can be optimized for a smaller size. The 2 suggested changes are the low-hanging fruit and reduce the compressed image size by 40MB/15% (uncompressed: 100MB/12.5%). There is probably potential for further space savings. Further investigation is required to evaluate the cost/benefit of these.

It should be 24.04 in the PR title instead of 22.04..

Dockerfile Outdated
ENV LC_CTYPE C.UTF-8
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8
ENV LC_CTYPE=C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer required. The used boost packages are from the official package repositories now.

Suggested change
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest

Dockerfile Outdated
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove apt cache.

Suggested change
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion && rm -rf /var/lib/apt/lists/*

hannahbast pushed a commit to ad-freiburg/qlever-control that referenced this pull request Nov 6, 2024
These two environment variables are useful when running the qlever CLI
within a Docker container, introduced by ad-freiburg/qlever#1439

The effect of QLEVER_OVERRIDE_SYSTEM_NATIVE is that after `qlever
setup-config <config name>`, the Qleverfile will contain the setting
`SYSTEM = native` and not `SYSTEM = docker`.

The effect of QLEVER_OVERRIDE_DISABLE_UI is that `qlever ui` is
disabled, which makes sense when running inside a container.

In both cases, meaningful log message are provided for the user.
hannahbast added a commit to ad-freiburg/qlever-control that referenced this pull request Nov 10, 2024
With ad-freiburg/qlever#1439, the recommended way to run QLever inside a container is to use the https://github.com/ad-freiburg/qlever-control script. Inside of the container, the environment variable `QLEVER_IS_RUNNING_IN_CONTAINER` is set, with the following two effects:

1. After `qlever setup-config <config name>`, the Qleverfile will contain the setting `SYSTEM = native` and not `SYSTEM = docker` and a warning message will be displayed that that is the case
2. When running `qlever ui`, an error message will be displayed that this command is disabled (reason: `qlever ui` runs another container, and we don't want containers inside of containers).
Hannah Bast added 4 commits November 16, 2024 20:49
It was surprisingly hard to get it to work well in both settings, with
reasonable user rights and all.
TODO: Check whether all parts of the Dockerfile are still needs and move
`.bashrc` for user `qlever` to own file `docker-bashrc`.
@hannahbast
Copy link
Member Author

@Qup42 and @ludovicm67 Can you please have another look?

I refactored and cleaned this up a lot now. More than I thought I needed too, but I am quite happy with the result.

Please read the comments in the Dockerfile and in the docker-entrypoint.sh script to understand the subtleties.

@hannahbast hannahbast changed the title Update Dockerfile to Ubuntu 22.04 + integrate qlever script Refactor the Dockerfile + integrate qlever script Nov 18, 2024
They are needed for the `docker publish` action
@hannahbast
Copy link
Member Author

@Qup42 and @ludovicm67 Please note that this has consequence for how the qlever CLI would be used together with Docker in the future:

  1. So far, when using QLever with Docker , the qlever CLI is called from outside Docker, with a Qleverfile that says SYSTEM = docker, and would start a Docker container itself. That no longer makes sense with the current setup because the qlever CLI is part of the Docker image and should be called from within.

  2. The way to use QLever with Docker now would be to run the Docker container (docker run adfreiburg/qlever will say how) and then inside of the container use the qlever CLI just like it would be used outside of the container. This would now be possible both in interactive mode (docker run ... without arguments) or in batch mode (docker run ... "<any commands>").

It looks much cleaner to me anyway. What do you think?

@ludovicm67
Copy link

ludovicm67 commented Nov 25, 2024

@hannahbast Thanks for all your updates.

Here are some comments that came in my mind while reviewing this PR.

This pull request is doing many things:

  • upgrade the base image from ubuntu:22.04 to ubuntu:24.04:

    This is useful in order to have more recent dependencies and reduce the list of vulnerabilities.

  • hamonize some syntax used in the Dockerfile, in order to have something consistent:

    • casing for FROM … AS …
    • the way to define environment variables: ENV VAR=value instead of ENV VAR value
  • complete change on how it's run, in terms of user and permissions

I think that in the future we should try to split big changes in smaller PRs (like one for the two first points and another one for the third one), in order to simplify the review process.

About the upgrade of the base image and the harmonization of the syntax, everything looks good to me.

About the entrypoint, I'm not sure if it is the best way to do it.
In some systems, Docker images are not allowed to be run as root at all (and it's a good practice).
The previous behavior was great for this point ; the new setup would be incompatible with these systems, as it first require to be run as root, before the dynamic switch to the user defined in the environment variable.
For people that are using OpenShift for example, their constraint would be to have a Docker image that could be run using a random non-root UID, and GID set to 0 (this is useful to know to set some permissions on volumes etc.).

As @Qup42 pointed in his comment, removing the cache would help to reduce the image size, which is pretty big right now.
Reducing the image size is always a good thing, as it lowers the time to pull the image, reduces the attack surface, and saves disk space and bandwidth.
This could be done in a future step that in a or multiple follow-up PR.

@Qup42 and @ludovicm67 Please note that this has consequence for how the qlever CLI would be used together with Docker in the future:

  1. So far, when using QLever with Docker , the qlever CLI is called from outside Docker, with a Qleverfile that says SYSTEM = docker, and would start a Docker container itself. That no longer makes sense with the current setup because the qlever CLI is part of the Docker image and should be called from within.
  2. The way to use QLever with Docker now would be to run the Docker container (docker run adfreiburg/qlever will say how) and then inside of the container use the qlever CLI just like it would be used outside of the container. This would now be possible both in interactive mode (docker run ... without arguments) or in batch mode (docker run ... "<any commands>").

It looks much cleaner to me anyway. What do you think?

For this, I just set native as the default environment variable value in the custom Docker image that I've built that is extending the official one: https://github.com/zazukoians/qlever-tests/blob/34c5aeb78b615a5b91bb1ec07bb3a2244fd6ff19/docker/common/generate-qleverfile.sh#L32

I agree that inside the Docker image, the default value should be set to native to make sure that it's trying to call docker from inside the container, which doesn't really make sense.

@hannahbast
Copy link
Member Author

@ludovicm67 Thanks for your latest comment, I missed it and read it only now. Here are some comments and questions:

  1. I completely agree that this PR is too large and complex. I will split it up as soon as it it clear how we should proceed.

  2. Adding the rm -rf /var/lib/apt/lists/* as suggested by @Qup42 only has a negligible effect on the image size (881 MB -> 879 MB). Looking only at file sizes inside of the container, here are the biggest contributors: /usr/include 217M, /usr/lib 321M, /usr/share 102M, /qlever 158M. Thoughts?

  3. You say that on some systems, Docker is not allowed run as root. How is that relevant for the entrypoint.sh script? When you are inside of the container, you can always be root if you want to. The entrypoint.sh needs and uses sudo rights only to modify the user and group id of the qlever user inside of the container. And it does that exactly so that it can create files in the mounted volume without being root, while at the same time having a proper name also inside of the container.

The alternative is to run the container with -u $(id -u):$(id -g) (which you can still do if you also set --entrypoint bash), but then that user has no proper user and group name inside of the container, which is weird when working inside of the container interactively.

@ludovicm67
Copy link

@hannahbast Thank you for your answer!

  1. No worries! It was just complicated for me to properly do a review for this PR ; that's also what took me time to answer (+ the fact that I was away for one week).
  2. I thought it would save a bit more space, but it's always nice to have some gains, even if they are small. I suggest that we can work on improving the size of the image in a future PR.
    I was not able to build the image on my machine yet ; I will try on another one, and maybe I can help a bit for this.
    From what I was able to see using dive, most of the size is coming from the various dependencies that are installed. Investigating the dependencies and see if -dev ones are required or not could be one axis on saving quite some space.
    But as said, let's keep this for a future PR.
  3. The issue with systems that doesn't allow containers to be run as root will not be able to run the entrypoint. But yes, overriding it should do the trick for now.

@hannahbast
Copy link
Member Author

@ludovicm67 Thanks for the quick reply. I still don't understand the root issue. In my understanding, things like rootless Docker refer to whether you can run Docker containers also as an unprivileged user. But even then, you can be root inside of the container. It's just that root inside of the container is mapped to an unprivileged user outside of the container, so that you can't do root things outside of the container.

Whereas with standard Docker, if you are root inside of the container and you have mounted volumes outside of the container, you can do root things also outside of the container.

@ludovicm67
Copy link

Maybe this can help you to understand the case with OpenShift: https://developers.redhat.com/blog/2020/10/26/adapting-docker-and-kubernetes-containers-to-run-on-red-hat-openshift-container-platform

But your entrypoint script is great when used with the CLI.
For the root issue, this can be tackled in a future PR.

@hannahbast
Copy link
Member Author

@ludovicm67 @Qup42 Thanks for all the discussions and feedback. I have now made final amendments, fixed the old Dockerfiles, and written a proper description above (which will become the commit message in the end). Please feel free to have another look and/or comment. Once the tests all run through, I intend to finally merge this.

@ludovicm67
Copy link

LGTM

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. Feel free to add any additional comments that you have in mind, and then merge it.

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
hannahbast and others added 2 commits December 4, 2024 06:07
Co-authored-by: Ludovic Muller <[email protected]>
Co-authored-by: Ludovic Muller <[email protected]>
@sparql-conformance
Copy link

Copy link

sonarcloud bot commented Dec 4, 2024

@hannahbast hannahbast merged commit 392b0e3 into master Dec 4, 2024
20 of 21 checks passed
@hannahbast
Copy link
Member Author

@ludovicm67 @Qup42 @joka921 Just merged this + thanks for your help + for the protocol: I tried to replace the -dev libraries for the final image with their non-dev counterparts, e.g., libzstd-dev with libzstd1. The effect was that the image size doubled, so I reverted that change.

@Qup42
Copy link
Member

Qup42 commented Dec 4, 2024

Opened #1658 for optimizing the size of the container image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants