Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buildah version 1.29.1 broken fuse-overlayfs in gitlab runner #4715

Closed
DaanGebraad opened this issue Apr 6, 2023 · 24 comments · Fixed by #4722
Closed

Buildah version 1.29.1 broken fuse-overlayfs in gitlab runner #4715

DaanGebraad opened this issue Apr 6, 2023 · 24 comments · Fixed by #4722

Comments

@DaanGebraad
Copy link

DaanGebraad commented Apr 6, 2023

Description

After our buildah image was upgraded using the v1.29.1 version we noticed all our pipelines that were using buildah build started to fail on our gitlab runners. Seems like an issue with the fuse-overlayfs package
We're using the quay.io/buildah/stable:latest image, after downgrading to v1.29.0 buildah worked again.

Steps to reproduce the issue:

  1. Upgrade buildah-stable to v1.29.1
  2. Run buildah build in gitlab runner

Describe the results you received:
Buildah build failing to unmount and mount

Command used:
$ buildah build -q -f .docker/Dockerfile -t $CI_REGISTRY_IMAGE:$BUILD_TAG -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA .
Output:
Line 1 time="2023-04-06T08:20:26Z" level=error msg="Unmounting /var/lib/containers/storage/overlay/de00ab76e27e966fa7c0c0b79a5ad1247cdf765946bc05ada5b6d99be3a42be5/merged: invalid argument"
Line 2 Error: mounting new container: mounting build container "9c4bcdfd7a1a64ae4bb1d399c930ae9a29e52d240c1405e00bf7638cde951901": creating overlay mount to /var/lib/containers/storage/overlay/de00ab76e27e966fa7c0c0b79a5ad1247cdf765946bc05ada5b6d99be3a42be5/merged, mount_data="lowerdir=/var/lib/containers/storage/overlay/l/X3DKLX3WMGVRBGZJYZ4MSULUEY:/var/lib/containers/storage/overlay/l/VLKLEB5WCVLZFQUO2W3ANMY7ZY:/var/lib/containers/storage/overlay/l/YWHPWE4M7KVTL6WXCXO6LBJW52:/var/lib/containers/storage/overlay/l/RJM72NK2LQWGBJU2CB2BWSTGBT:/var/lib/containers/storage/overlay/l/6TW2TMTSHXLBWS4KEJYUFY73CA:/var/lib/containers/storage/overlay/l/MTOGBS4AKRVUFTNZYIFGJLFZCX:/var/lib/containers/storage/overlay/l/ZZXGS22J43ZTSNW3SH2S2L4WRU,upperdir=/var/lib/containers/storage/overlay/de00ab76e27e966fa7c0c0b79a5ad1247cdf765946bc05ada5b6d99be3a42be5/diff,workdir=/var/lib/containers/storage/overlay/de00ab76e27e966fa7c0c0b79a5ad1247cdf765946bc05ada5b6d99be3a42be5/work,nodev,fsync=0,volatile": invalid argument

Describe the results you expected:
A working buildah build command

Screenshot 2023-04-06 at 11 11 07

@flouthoc
Copy link
Collaborator

flouthoc commented Apr 6, 2023

Could you try setting graph options to null in your storage.conf ?

@Tiscs
Copy link

Tiscs commented Apr 6, 2023

Same issue for me, and I rolled back to v1.29.0 instead of the latest version.

@elacheche
Copy link

elacheche commented Apr 6, 2023

Hello @flouthoc

I have the same issue, I made some investigation and I noticed that the latest image were (re)uploaded 18 hours ago with a missing config!

Do you have any idea why and how is that possible?

Luckily, 7 days ago I creatde a custom image based on the latest 1.29.1, below are details from both:

Edit: The output below is the result of a clean version of /etc/containers/storage.conf (no comments and no empty lines) and buildah version

custom

[storage] 
driver = "overlay"                                  
runroot = "/run/containers/storage"                  
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = [
 "/var/lib/shared",
 ] 
pull_options = {enable_partial_images = "false", use_hard_links = "false", ostree_repos=""}      
[storage.options.overlay]                      
mount_program = "/usr/bin/fuse-overlayfs"       
mountopt = "nodev,fsync=0"                     
[storage.options.thinpool]                     
Version:         1.29.1                           
Go Version:      go1.19.5
Image Spec:      1.0.2-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        1.0.0
libcni Version:  v1.1.2
image Version:   5.24.1
Git Commit:
Built:           Fri Feb 17 10:05:41 2023
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

Today's latest

[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = [
"/var/lib/shared",
]
pull_options = {enable_partial_images = "false", use_hard_links = "false", ostree_repos=""}
[storage.options.overlay]
mountopt = "nodev,fsync=0"
[storage.options.thinpool]
Version:         1.29.1
Go Version:      go1.19.5
Image Spec:      1.0.2-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        1.0.0
libcni Version:  v1.1.2
image Version:   5.24.1
Git Commit:
Built:           Fri Feb 17 10:05:41 2023
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

diff

$ git diff buildah_custom.txt buildah_latest.txt
diff --git a/buildah_custom.txt b/buildah_latest.txt
index 1c7966c..27435fa 100644
--- a/buildah_custom.txt
+++ b/buildah_latest.txt
@@ -8,7 +8,6 @@ additionalimagestores = [
 ]
 pull_options = {enable_partial_images = "false", use_hard_links = "false", ostree_repos=""}
 [storage.options.overlay]
-mount_program = "/usr/bin/fuse-overlayfs"
 mountopt = "nodev,fsync=0"
 [storage.options.thinpool]
 Version:         1.29.1

Did the image build workflow changed?

Thanks in advance

@ziouf
Copy link

ziouf commented Apr 6, 2023

It looks like somthing is triggering a rebuild of images, overriding latest, v1, v1.29 and v1.29.1 tags on a daily basis.

https://quay.io/repository/buildah/stable?tab=history

I assume that v1.29.1 should be an immuable tag

@Blaimi
Copy link

Blaimi commented Apr 6, 2023

same issue here. I wrote a minimal working example at https://gitlab.com/Blaimi/buildah-bughunt.

@Blaimi
Copy link

Blaimi commented Apr 6, 2023

I assume that v1.29.1 should be an immuable tag

They are all built daily according to the readme. v1.29.0 seems not to be build daily anymore because it is outdated

@ziouf
Copy link

ziouf commented Apr 6, 2023

I assume that v1.29.1 should be an immuable tag

They are all built daily according to the readme. v1.29.0 seems not to be build daily anymore because it is outdated

That's a non-sense to me that stable release doesn't have stable tags ...

@elacheche
Copy link

elacheche commented Apr 6, 2023

same issue here. I wrote a minimal working example at https://gitlab.com/Blaimi/buildah-bughunt.

In my case, I have a lot of pipelines (different projects with multiple branchs), my workaround is to definne aa Gitlab Group CI/CD variable STORAGE_DRIVER=vfs

@Blaimi
Copy link

Blaimi commented Apr 6, 2023

my workaround is to define a Gitlab Group CI/CD variable STORAGE_DRIVER=vfs

I extended my example with this variable in the matrix-builds and set an hourly scheduler on the build.

That's a non-sense to me that stable release doesn't have stable tags …

I wrote #4717 for that 😸.

@TomSweeneyRedHat
Copy link
Member

@giuseppe might we get lucky and have a fix in the newly released fuse-overlayfs v1.11?

@flouthoc
Copy link
Collaborator

flouthoc commented Apr 7, 2023

I think nothing's wrong in fuse-overlay, its just the config was removed here: #4699

@flouthoc
Copy link
Collaborator

flouthoc commented Apr 7, 2023

@giuseppe @rhatdan maybe we will need to revert this PR for users running builds on old kernels.

@elacheche
Copy link

elacheche commented Apr 7, 2023

@giuseppe @rhatdan maybe we will need to revert this PR for users running builds on old kernels.

Yes, this confirm my analysis #4715 (comment)

But the real question here is, why a code merged two days ago triggered a re-build of a release that is more than a month old, with the same version/tag..

This is also a CI/CD bug.

@flouthoc , can you please share more details about you saying "old kernels"? I am interesting to learn more about that and why my Amazon Linux 2 is using an "old kernel", or maybe It's not and I just need to enable some extra modules. Thx

@flouthoc
Copy link
Collaborator

flouthoc commented Apr 7, 2023

@elacheche native overlay is easily supported on rootless setups after kernel 5.13 and above ( its was added in 5.11 but I think there were some bugs in 5.11 ) therefore folks running old kernels have no option but to fallback to use fuse-overlays for rootless builds OTOH for users running newer kernels buildah will automatically use native overlay on rootless setups.

This is also a CI/CD bug.

Indeed CI/CD has a issue if its modifying older tags :)

@hkrutzer
Copy link

hkrutzer commented Apr 7, 2023

I have a Gitlab runner with kernel version 5.15 and I'm also seeing this issue.

@TomSweeneyRedHat
Copy link
Member

@rhatdan PTAL Should we roll #4699 back?

@TomSweeneyRedHat
Copy link
Member

@cevich some CI questions in here for the quay container images, in case you didn't see this.

@cevich
Copy link
Member

cevich commented Apr 10, 2023

Ironically I too ran into this issue 😞

Indeed CI/CD has a issue if its modifying older tags :)

As y'all found in the readme, the builds happen daily from main to incorporate updates (esp security) for all packages in the image. The image tags are simply extracted from the RPM versions. In the case of the v1.29.1 RPM, tags would be pushed for latest, v1.29.1, v1.29, and v1 - all with the exact same contents.

Since it's a Containerfile change and assuming #4722 is the fix (I haven't looked deeply), then as soon as it's merged, the daily builds will push out new latest, v1.29.1, v1.29, and v1. On the other hand, for users looking for truly immutable/unchanging images, you need to reference the image by sha256 or use the "n-1" tags that aren't updated daily.

Just for some history: There was a great design debate among the containers team, on which approach to take. We decided that since the tag represents the buildah-version, it was better to keep the images continuously updated on the off-chance some non-buildah critical security fix was released. Or in this case, a Containerfile bug.

@dhduvall
Copy link

#4717 is probably where the image tag stability discussion belongs, but the problem there (as I see it) is not that the underlying OS bits are getting updated daily (that seems perfectly fine), but that the buildah bits are being rebuilt against main as well. And the workaround/even-more-stable-option of using hashes doesn't work because old manifests are discarded.

@cevich
Copy link
Member

cevich commented Apr 11, 2023

but that the buildah bits are being rebuilt against main as well.

Only the upstream flavor of the image does that. The other two (stable and testing) install from the distro. RPMs. If the RPM versions change, the image tags will change as well (since they're extracted from the RPM version).

@dhduvall
Copy link

I see my confusion:

The manifest from the latest image shows opencontainers.image.version=1.29.1 and org.opencontainers.image.revision=b80da50..., where that commit ID is the one from #4722, the current HEAD. That's true really only for the bits that build the container; like you say, buildah itself comes from the distro RPM, and although the commit it was built from isn't captured, the date does demonstrate it's not recent:

[root@1b888e6da80e /]# buildah version
Version:         1.29.0
Go Version:      go1.19.5
Image Spec:      1.0.2-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        1.0.0
libcni Version:  v1.1.2
image Version:   5.24.0
Git Commit:
Built:           Tue Jan 31 12:06:15 2023
OS/Arch:         linux/arm64
BuildPlatform:   linux/arm64/v8

So I went down the wrong path because of that label and the fact that it was the build that caused the problem and not changes to the executable. I don't know if there's a way to avoid that confusion, or even if it's worth trying.

@cevich
Copy link
Member

cevich commented Apr 11, 2023

Oh! hang on a sec...you're right! That's a bug in the build script. Those labels are completely wrong when the image uses RPMs. I'll open an issue on that and get on about fixing it. Thanks for pointing out the mismatch.

@nolange
Copy link

nolange commented May 3, 2023

@giuseppe @rhatdan maybe we will need to revert this PR for users running builds on old kernels.

I am on linux 6.1 and I still need this config - atleast when buildah is invoked via a gitlab-runner, see #4669

@rhatdan
Copy link
Member

rhatdan commented May 4, 2023

You should be able to turn it on if we disable fuse-overlayfs by default.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.