Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to disable enforcing of force_mask='700' for network file systems #1839

Closed
TimSchneider42 opened this issue Feb 15, 2024 · 15 comments · Fixed by #1840
Closed

Comments

@TimSchneider42
Copy link

TimSchneider42 commented Feb 15, 2024

Feature request description

Hi,

I run a small compute cluster for my research group in which we use rootless Podman. As file system for the user homes, we use BeeGFS (formerly FhGFS). BeeGFS seems to understand subuids, which is a crucial advantage over NFS for deploying rootless containers.

While running containers works fine, I am experiencing an issue with building images:

$ podman run -it ubuntu bash
WARN[0000] Network file system detected as backing store. Enforcing overlay option `force_mask="700"`.Add it to storage.conf to silence this warning
Resolved "ubuntu" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/ubuntu:latest...
Getting image source signatures
Copying blob 57c139bbda7e done
Error: copying system image from manifest list: writing blob: adding layer with blob "sha256:57c139bbda7eb92a286d974aa8fef81acf1a8cbc742242619252c13b196ab499": processing tar file(lsetxattr /: operation not supported): exit status 1

I believe this issue is caused by the force_mask setting, which is applied because podman detects that BeeGFS is a network filesystem. If I trick Podman into believing that /home is a regular filesystem with an additional fuse-bindfs mount, it does not apply force_mask, and everything works fine, albeit at a performance penalty due to fuse-bindfs:

$ sudo mkdir /home-bindfs
$ sudo mount -t fuse.bindfs /home /home-bindfs
$ podman --root=/home-bindfs/$USER/.local/share/containers/storage run -it ubuntu bash
Resolved "ubuntu" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/ubuntu:latest...
Getting image source signatures
Copying blob 57c139bbda7e done
Copying config fd1d8f58e8 done
Writing manifest to image destination
root@9521fdb6f3bf:/

I understand that network filesystems are currently not supported, but in this case, I feel it could be as easy as providing an option for disabling the force_mask setting for network filesystems. Unless the bindfs-trick does something else I am not aware of.

Thanks a lot in advance!

Suggest potential solution

Adding an option for disabling the forced setting of force_mask="700" for network filesystems. Or adding an explicit exemption for BeeGFS, since it seems to work with Podman.

Have you considered any alternatives?

A workaround is to trick Podman into believing the storage directory is not in a network filesystem with an additional fuse-bindfs mount as I described above. However, this has a serious impact on performance.

Additional context

No response

@rhatdan
Copy link
Member

rhatdan commented Feb 20, 2024

@giuseppe PTAL

The problem with network file systems is that the remote side does not understand user namespace, From its point of view it sees UID 1000 attempting to chown a file to UID 100001 for example when doing a podman build, This ends up blocked on the server side, and their is nothing Podman can do about that.

When we run fuse-overlayfs on top we can change the underlying file system to not chown the file on the remote server but to only change an Xattr on the file, which would be allowed on the server side. The fuse-overlayfs file system then would report the changed UID on the client side to the container.

Don't know if BeeGFS has similar support or your fuse file system.

@TimSchneider42
Copy link
Author

TimSchneider42 commented Feb 20, 2024

Hi,

I did some tests with rootlesskit on a BeeGFS directory with a non-privileged user. I was able to change ownership of a file in that directory to one of my subuids:

$ cd /mnt/beegfs
$ touch test
$ rootlesskit chown 1:1 test  # Works
$ ls -l
-rw-rw-r--  1   1410720   1410720         0 Feb 20 12:31 test

So to me it seems like BeeGFS understands subuids and thus does not require any special treatment. In fact, setting xattr does not seem to be supported. So I think having an option to switch this behavior off would be beneficial.

Best,
Tim

@giuseppe
Copy link
Member

can you show the output for $ stat -c %i -f /mnt/beegfs? If it is something like 0x19830326 then could you please try this patch in c/storage?

diff --git a/drivers/overlay/overlay.go b/drivers/overlay/overlay.go
index d3781955d..938313d34 100644
--- a/drivers/overlay/overlay.go
+++ b/drivers/overlay/overlay.go
@@ -296,7 +296,7 @@ func isNetworkFileSystem(fsMagic graphdriver.FsMagic) bool {
        // a bunch of network file systems...
        case graphdriver.FsMagicNfsFs, graphdriver.FsMagicSmbFs, graphdriver.FsMagicAcfs,
                graphdriver.FsMagicAfs, graphdriver.FsMagicCephFs, graphdriver.FsMagicCIFS,
-               graphdriver.FsMagicFHGFSFs, graphdriver.FsMagicGPFS, graphdriver.FsMagicIBRIX,
+               graphdriver.FsMagicGPFS, graphdriver.FsMagicIBRIX,
                graphdriver.FsMagicKAFS, graphdriver.FsMagicLUSTRE, graphdriver.FsMagicNCP,
                graphdriver.FsMagicNFSD, graphdriver.FsMagicOCFS2, graphdriver.FsMagicPANFS,
                graphdriver.FsMagicPRLFS, graphdriver.FsMagicSMB2, graphdriver.FsMagicSNFS,

@TimSchneider42
Copy link
Author

Hi Guiseppe,

$ stat -c %i -f /mnt/beegfs prints 0. The same also holds for files inside /mnt/beegfs.

Best,
Tim

@giuseppe
Copy link
Member

thanks, could you try that patch anyway? I've no access to a beegfs file system

@giuseppe
Copy link
Member

$ stat -c %i -f /mnt/beegfs prints 0. The same also holds for files inside /mnt/beegfs.

and sorry, I've messed up the command. I think we need stat -c %t -f /mnt/beegfs

@TimSchneider42
Copy link
Author

Okay, I need to figure out how to build podman first. I will get back to you.

@TimSchneider42
Copy link
Author

stat -c %t -f /mnt/beegfs returns 19830326

@giuseppe
Copy link
Member

stat -c %t -f /mnt/beegfs returns 19830326

great, then I think the patch above should work well

@TimSchneider42
Copy link
Author

TimSchneider42 commented Feb 20, 2024

So I just added the patch and built v4.9.3, which seems to run fine:

$ podman run -it ubuntu bash
Resolved "ubuntu" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/ubuntu:latest...
Getting image source signatures
Copying blob 01007420e9b0 done   | 
Copying config 3db8720ecb done   | 
Writing manifest to image destination
root@5d800ce4ab5f:/#

@giuseppe giuseppe transferred this issue from containers/podman Feb 20, 2024
giuseppe added a commit to giuseppe/storage that referenced this issue Feb 20, 2024
it seems to honor user namespaces, so no need to treat it as a network
file system.

Feedback got as part of the discussion on the issue.

Closes: containers#1839

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Feb 20, 2024
it seems to honor user namespaces, so no need to treat it as a network
file system.

Feedback got as part of the discussion on the issue.

Closes: containers#1839

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

thanks, opened a PR: #1840

@TimSchneider42
Copy link
Author

Thanks a lot for your help!

@jiridanek
Copy link

jiridanek commented May 25, 2024

@TimSchneider42 are you using podman with this fix more extensively nowadays? How is it working for you? I tried to replicate a setup with rootless podman using a container storage directory on a BeeGFS mount, and while I can run various small images like the ubuntu you showed above, or fedora minimal, I see failures with running other images, such as

$ podman run --rm -it quay.io/sclorg/python-39-c9s:c9s
Trying to pull quay.io/sclorg/python-39-c9s:c9s...
Getting image source signatures
Copying blob 29f3d326b791 done   | 
Copying blob eff7947d9dec done   | 
Copying blob f2639e1c865c done   | 
Copying blob 81167cd56173 done   | 
Copying config d5906b34a8 done   | 
Writing manifest to image destination
ERRO[0123] Cleaning up container ea4ce5b577502cd17740dbec193b83934131c57adefb431785caa1bf547fccb5: unmounting container ea4ce5b577502cd17740dbec193b83934131c57adefb431785caa1bf547fccb5 storage: cleaning up container ea4ce5b577502cd17740dbec193b83934131c57adefb431785caa1bf547fccb5 storage: unmounting container ea4ce5b577502cd17740dbec193b83934131c57adefb431785caa1bf547fccb5 root filesystem: replacing mount point "/mnt/beegfs/containers/overlay/e4e8bd8327a1f510b63a6681d02b41a7ae2fe840421308f4fdd8eb754884953d/merged": file exists 
Error: creating temporary passwd file for container ea4ce5b577502cd17740dbec193b83934131c57adefb431785caa1bf547fccb5: container ea4ce5b577502cd17740dbec193b83934131c57adefb431785caa1bf547fccb5: open /mnt/beegfs/containers/overlay/e4e8bd8327a1f510b63a6681d02b41a7ae2fe840421308f4fdd8eb754884953d/merged/etc/group: device or resource busy
[cloud-user@jdanek-podman-builder-3 notebooks]$ podman run --rm -it quay.io/sclorg/python-39-c9s:c9s
ERRO[0000] Cleaning up container 6102cf27216181176102f3ed6daa6802314bc3e933bc9231b59e0447d93bceec: unmounting container 6102cf27216181176102f3ed6daa6802314bc3e933bc9231b59e0447d93bceec storage: cleaning up container 6102cf27216181176102f3ed6daa6802314bc3e933bc9231b59e0447d93bceec storage: unmounting container 6102cf27216181176102f3ed6daa6802314bc3e933bc9231b59e0447d93bceec root filesystem: replacing mount point "/mnt/beegfs/containers/overlay/adf52678131f9aa408f2253f32bf34371f39639f1608ddc7c4a131319c7fd895/merged": file exists 
Error: creating temporary passwd file for container 6102cf27216181176102f3ed6daa6802314bc3e933bc9231b59e0447d93bceec: container 6102cf27216181176102f3ed6daa6802314bc3e933bc9231b59e0447d93bceec: open /mnt/beegfs/containers/overlay/adf52678131f9aa408f2253f32bf34371f39639f1608ddc7c4a131319c7fd895/merged/etc/group: device or resource busy
[cloud-user@jdanek-podman-builder-3 notebooks]$ podman run --rm -it quay.io/sclorg/python-39-c9s:c9s bash
ERRO[0000] Cleaning up container 836accc0f7bc9c3a150d50d8db171602ee9c59b4c98a682dfde48bec2d6ba4ba: unmounting container 836accc0f7bc9c3a150d50d8db171602ee9c59b4c98a682dfde48bec2d6ba4ba storage: cleaning up container 836accc0f7bc9c3a150d50d8db171602ee9c59b4c98a682dfde48bec2d6ba4ba storage: unmounting container 836accc0f7bc9c3a150d50d8db171602ee9c59b4c98a682dfde48bec2d6ba4ba root filesystem: replacing mount point "/mnt/beegfs/containers/overlay/1eea2b56023f34e9a2c1df3c63c9e51dbadc6eb979c1c6254155df938e922649/merged": file exists 
Error: creating temporary passwd file for container 836accc0f7bc9c3a150d50d8db171602ee9c59b4c98a682dfde48bec2d6ba4ba: container 836accc0f7bc9c3a150d50d8db171602ee9c59b4c98a682dfde48bec2d6ba4ba: open /mnt/beegfs/containers/overlay/1eea2b56023f34e9a2c1df3c63c9e51dbadc6eb979c1c6254155df938e922649/merged/etc/group: device or resource busy

or with image builds

[cloud-user@jdanek-podman-builder-3 notebooks]$ time make base-c9s-python-3.9
# Building quay.io/opendatahub/workbench-images:base-c9s-python-3.9-2024a_20240525 image...
# Pushing quay.io/opendatahub/workbench-images:base-c9s-python-3.9-2024a_20240525 image...
podman build --no-cache  -t quay.io/opendatahub/workbench-images:base-c9s-python-3.9-2024a_20240525  base/c9s-python-3.9
STEP 1/10: FROM quay.io/sclorg/python-39-c9s:c9s
STEP 2/10: LABEL name="odh-notebook-base-centos-stream9-python-3.9"       summary="Python 3.9 CentOS Stream 9 base image for ODH notebooks"       description="Base Python 3.9 builder image based on CentOS Stream 9 for ODH notebooks"       io.k8s.display-name="Python 3.9 c9s base image for ODH notebooks"       io.k8s.description="Base Python 3.9 builder image based on C9S for ODH notebooks"       authoritative-source-url="https://github.com/opendatahub-io/notebooks"       io.openshift.build.commit.ref="main"       io.openshift.build.source-location="https://github.com/opendatahub-io/notebooks/tree/main/base/c9s-python-3.9"       io.openshift.build.image="quay.io/opendatahub/workbench-images:base-c9s-python-3.9"
--> b0d475041984
STEP 3/10: WORKDIR /opt/app-root/bin
--> 1e966e3115d8
STEP 4/10: RUN pip install --no-cache-dir -U "micropipenv[toml]"
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
    status = run_func(*args)
  File "/opt/app-root/lib64/python3.9/site-packages/pip/_internal/cli/req_command.py", line 248, in wrapper
    return func(self, options, args)
  File "/opt/app-root/lib64/python3.9/site-packages/pip/_internal/commands/install.py", line 333, in run
    build_tracker = self.enter_context(get_build_tracker())
  File "/opt/app-root/lib64/python3.9/site-packages/pip/_internal/cli/command_context.py", line 27, in enter_context
    return self._main_context.enter_context(context_provider)
  File "/usr/lib64/python3.9/contextlib.py", line 448, in enter_context
    result = _cm_type.__enter__(cm)
  File "/usr/lib64/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/opt/app-root/lib64/python3.9/site-packages/pip/_internal/operations/build/build_tracker.py", line 46, in get_build_tracker
    root = ctx.enter_context(TempDirectory(kind="build-tracker")).path
  File "/opt/app-root/lib64/python3.9/site-packages/pip/_internal/utils/temp_dir.py", line 125, in __init__
    path = self._create(kind)
  File "/opt/app-root/lib64/python3.9/site-packages/pip/_internal/utils/temp_dir.py", line 164, in _create
    path = os.path.realpath(tempfile.mkdtemp(prefix=f"pip-{kind}-"))
  File "/usr/lib64/python3.9/tempfile.py", line 352, in mkdtemp
    prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)
  File "/usr/lib64/python3.9/tempfile.py", line 122, in _sanitize_params
    dir = gettempdir()
  File "/usr/lib64/python3.9/tempfile.py", line 291, in gettempdir
    tempdir = _get_default_tempdir()
  File "/usr/lib64/python3.9/tempfile.py", line 223, in _get_default_tempdir
    raise FileNotFoundError(_errno.ENOENT,
FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/opt/app-root/bin']

[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
ERRO[0018] unable to cleanup run mounts copier: remove "/run/.containerenv": unlinkat /run/.containerenv: device or resource busy 
ERRO[0018] error deleting build container "3c50090b6015719f8f94738186013b81c10955fa58bca6f2dc6bd0040cfd1e95": replacing mount point "/mnt/beegfs/containers/overlay/5b5c2c02acefa6f01fcbdecc925745afc7469909bb73fe80cd4ff28e7ac43ce7/merged": file exists 
ERRO[0018] error deleting build container "206533b89f0eae3d96234db92ca6e07400f343557e2e2b126d96683859f3fd5b": replacing mount point "/mnt/beegfs/containers/overlay/2d1b75c0d27e9c28661e30e1491aea2703cdba4328389f9110439bccecb62a8f/merged": file exists 
Error: replacing mount point "/mnt/beegfs/containers/overlay/2d1b75c0d27e9c28661e30e1491aea2703cdba4328389f9110439bccecb62a8f/merged": file exists: building at STEP "RUN pip install --no-cache-dir -U "micropipenv[toml]"": while running runtime: exit status 2
make: *** [Makefile:253: base-c9s-python-3.9] Error 2

real    0m18.544s
user    0m1.415s
sys     0m0.969s

I installed BeeGFS by following the quickstart https://doc.beegfs.io/7.4.3/quick_start_guide/quick_start_guide.html#example-setup, using the RHEL 9 packages from https://www.beegfs.io/release/beegfs_7.4.3/dists/. I mistakenly used unsupported 9.4 and had to patch things up by installing older kernel (otherwise client kernel module would not build). I also set up the extended file attributes which podman seems to need by folowing https://doc.beegfs.io/7.3.4/advanced_topics/acl.html

I installed a supported kernel using

sudo dnf install -y kernel-devel-5.14.0-362.24.1.el9_3.x86_64 kernel-5.14.0-362.24.1.el9_3.x86_64

To tests that extended file attributes are enabled, I did

touch /mnt/beegfs/pepa
python -c 'import os; os.setxattr("/mnt/beegfs/pepa", "user.lek", b"value", os.XATTR_CREATE)'
getfattr -d -m ^ -R -- /mnt/beegfs/pepa
getfattr: Removing leading '/' from absolute path names
# file: mnt/beegfs/pepa
user.lek="value"

And to install podman, I tried first a centos koji build I found https://kojihub.stream.centos.org/koji/buildinfo?buildID=61248 and then the latest build on copr at https://copr.fedorainfracloud.org/coprs/rhcontainerbot/podman-next/. Both exhibited the unsatisfactory results from above.

I manged to make podman work when I switched from overlay to vfs, but that's glacially slow.

I don't feel like I have enough info collected for a competent issue ticket, so I'm just leaving a comment. Furthermore, the experience made me realize I don't want to do a network filesystem setup with podman. Instead of exposing the disk through BeeGFS for podmans on different machines to use, I should instead expose podman socket and use remote podman. That would work much better and there will be fewer low level problems like this.

@TimSchneider42
Copy link
Author

Hi @jiridanek,

We have indeed experienced the same problems you described. It seems like apt update (or upgrade, don't remember) does not work, which makes it basically impossible to build images if the storage location is remote. The problem seems to stem from overlay-fs on top of BeeGFS, but I did not have time yet to properly isolate the issue, so I did not write an issue about it.

However, what works is pulling and importing images, which means that images can be built somewhere else and transferred into remote storage. It is a bit annoying but feasible in our case. What also works is using BeeGFS as an additional image store (via the additionalimagestores config option), which means that you can have shared images as long as the containers are local.

I am unsure if Podman actually supports multiple devices accessing a shared container storage location concurrently. I vaguely remember doing some tests with it (back when we were still on NFS) that did not go well, but I no longer remember the details.

So, to summarize, in our case, we have one machine that uses the BeeGFS storage as a main storage location for Podman. Unfortunately, it cannot build images, and running containers is also restricted due to the issues you mentioned. However, it can pull and import images into BeeGFS storage. All other machines in the network use the BeeGFS storage as an additional (read-only) image store. This works quite well except for the building problems on the main machine.

If you have any insights into this filesystem problem or any idea how to debug it, let me know.

Best,
Tim

@jiridanek
Copy link

I decided to try plain NFS instead of BeeGFS, where I encountered

which is something I saw with BeeGFS too, but since I managed to reproduce this without any networked filesystem whatsoever, I reported it that way, hoping it will get more attention and will be easier to reproduce and fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants