Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race: parallel builds: copying...committing...creating... layer not known #5674

Open
edsantiago opened this issue Aug 8, 2024 · 6 comments

Comments

@edsantiago
Copy link
Member

This might be the same as containers/podman#23331 . If it is, someone please close this or move.

Setup:

$ for i in 1 2;do printf "FROM quay.io/libpod/testimage:20240123\nRUN echo hi from $i\n" >Containerfile$i;done

In window 1:

$ while :;do buildah build -t c1 --layers=true -f Containerfile1 || break;buildah rmi c1;done

In window 2:

$ while :;do buildah build --layers=false -t c2 -f Containerfile2|| break;buildah rmi c2;done

Within 30-60s, window 1 will barf:

STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
Error: checking if cached image exists from a previous build: getting top layer info: layer not known

or

STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
hi from 1
COMMIT c1
Error: committing container for step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[echo hi from 1] Flags:[] Attrs:map[] Message:RUN echo hi from 1 Heredocs:[] Original:RUN echo hi from 1}: copying layers and metadata for container "a8d0253ccd5f337ca69e106657dc645e4926b20d9775621827b6ec118bcb35fa": committing the finished image: creating image "41778f8cf15b69d1fdb79d5bb744ba65eac877e27a21dd12af8700594d88585b": layer not known

The rmi seems important; I can't get it to fail (at least not within my patience tolerance of ~10m) if I omit rmi from either loop.

Testing with podman fails MUCH faster than buildah, for reasons I don't understand, and also fails sometimes in window 2. Buildah only fails in window 1.

This is blocking parallelization of podman test 070-build and I bet this is one of the uncategorized weirdnesses I've seen in #5552 but didn't follow up on.

@edsantiago
Copy link
Member Author

Issue persists:

<+0042s> # # podman build -t b-t156-muinxj0h /tmp/CI_dBI1/podman_bats.20lh4r/build-test
<+477ms> # STEP 1/3: FROM quay.io/libpod/testimage:20240123
         # STEP 2/3: COPY ./ /tmp/test/
         # Error: checking if cached image exists from a previous build: getting top layer info: layer not known
<+005ms> # [ rc=125 (** EXPECTED 0 **) ]

Podman PR containers/podman#23275 with current buildah (v1.37.1-0.20240828183349-69259725a0df) vendored.

@nalind
Copy link
Member

nalind commented Aug 29, 2024

This is two builds with --layers=true, which means they're reading each other's work as cache candidates, which is not something #5686 was concerned with.

edsantiago added a commit to edsantiago/libpod that referenced this issue Sep 17, 2024
Need --layers=false in podman build, otherwise a buildah race
can trigger "layer not known" failures:

   containers/buildah#5674

Signed-off-by: Ed Santiago <[email protected]>
Copy link

A friendly reminder that this issue had no activity for 30 days.

@edsantiago
Copy link
Member Author

This is not stale, but I don't have privs to edit tags in this repo.

edsantiago added a commit to edsantiago/libpod that referenced this issue Oct 28, 2024
In preparation for maybe some day being able to run build tests
in parallel.

SUPER IMPORTANT NOTE! BUILD TESTS CANNOT BE PARALLELIZED YET!
buildah, when run in parallel, barfs with:

    race: parallel builds: copying...committing...creating... layer not known

See containers/buildah#5674

Signed-off-by: Ed Santiago <[email protected]>
edsantiago added a commit to edsantiago/libpod that referenced this issue Oct 28, 2024
In preparation for maybe some day being able to run build tests
in parallel.

SUPER IMPORTANT NOTE! BUILD TESTS CANNOT BE PARALLELIZED YET!
buildah, when run in parallel, barfs with:

    race: parallel builds: copying...committing...creating... layer not known

Until this is fixed, podman-build can never be run in parallel.
See containers/buildah#5674

This PR is simply cleaning things up so, if/when that day comes,
the ensuing parallelize PR will be short & sweet.

Signed-off-by: Ed Santiago <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Oct 31, 2024

@edsantiago added the power, use it responsibly. 😄

edsantiago added a commit to edsantiago/libpod that referenced this issue Nov 21, 2024
DO NOT MERGE! Seriously, like don't even fantasize about it
until containers/buildah#5674 is
fixed and the fix is vendored into podman.

I am filing this because if/when that happens, this PR will
give you a nice CI-runtime boost. I spent a good chunk of
time identifying the tests which can / can't be parallelized
in this module. Hope it helps some day.

Signed-off-by: Ed Santiago <[email protected]>
Copy link

github-actions bot commented Dec 1, 2024

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants