bats tests - parallelize #5552

edsantiago · 2024-05-29T17:45:02Z

All bats tests run with custom root/runroot, so it should be
possible to parallelize them.

(As of this initial commit, tests fail on my laptop, and I expect them to fail here. I just want to get a sense for how things go.)

Signed-off-by: Ed Santiago [email protected]

None

openshift-ci · 2024-05-29T17:45:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: edsantiago
Once this PR has been reviewed and has the lgtm label, please assign rhatdan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

packit-as-a-service · 2024-06-03T19:04:30Z

Ephemeral COPR build failed. @containers/packit-build please check.

Luap99

I assume once we enabled parallel runs here we can port that over for the bud tests on podman? I like to get the speed up there too.

I haven't looked deeply into the prefetch logic changes but this looks much easier than podman so that is good.

Luap99 · 2024-07-04T15:38:35Z

tests/test_runner.sh

@@ -19,4 +19,4 @@ function execute() {
 TESTS=${@:-.}

 # Run the tests.
-execute time bats --tap $TESTS
+execute time bats -j 4 --tap $TESTS


I would use $(nproc) here instead of hard coding any count, should make it much faster when run locally.

github-actions · 2024-09-20T00:12:54Z

A friendly reminder that this PR had no activity for 30 days.

mtrmac · 2024-10-10T16:27:59Z

(A brief look after a link from containers/skopeo#2437 )

zstd found even though push was without --force-compression: My first guess is that the FROM alpine; _EOF - built image (via build caching?) matches an image created in some other concurrently running test, and that causes BlobInfoCache to have a record about the zstd-compressed layer version.

For this purpose, it would be better to have the image’s layers be clearly independent from any other test — maybe a FROM scratch adding a file that contains the test name + timestamp.

mtrmac · 2024-10-10T16:28:16Z

… oh, and: debug-level logs could help.

edsantiago · 2024-10-10T17:48:47Z

For this purpose, it would be better to have the image’s layers be clearly independent from any other test — maybe a FROM scratch adding a file that contains the test name + timestamp.

Thank you. I thought I had checked for conflicts, but must've missed something. I'll look into this again when time allows.

edsantiago · 2024-11-04T22:23:10Z

push test still flaking, and I'm giving up for the day. It is stumping me:

not ok 50 bud: build push with --force-compression

# # buildah build ...
...
# f0e6fc41f58b93d990cb24331c648bc84b066822d06782aec493d97bc2b7b263
# # buildah push [...]--compression-format gzip img-t50-nesrbrf5 docker://localhost:35311/img-t50-nesrbrf5
...
# # buildah push [...] --compression-format zstd --force-compression=false img-t50-nesrbrf5 docker://localhost:35311/img-t50-nesrbrf5
...
# # skopeo inspect img-t50-nesrbrf5
# {"schemaVersion":2,"mediaType":"application/vnd.oci.image.manifest.v1+json","config":{"mediaType":"application/vnd.oci.image.config.v1+json","digest":"sha256:f0e6fc41f58b93d990cb24331c648bc84b066822d06782aec493d97bc2b7b263","size":524},"layers":[{"mediaType":"application/vnd.oci.image.layer.v1.tar+zstd","digest":"sha256:05d36d22e1ad1534254c6965a3b43cf39f4dca9d5c95551eccf40108f076da2b","size":146}],"annotations":{"org.opencontainers.image.base.digest":"","org.opencontainers.image.base.name":""}}
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: zstd found even though push was without --force-compression
# #| expected: !~ 'zstd'

Where is that 05d36 layer coming from?

mtrmac · 2024-11-05T16:40:19Z

I can’t see any obvious reason. I’d suggest doing the pushes with --log-level=debug, that should include traces like Trying to reuse blob …

siteshwar · 2024-11-07T14:21:39Z

I understand this pull request is in draft state, but I got this warning on OpenScanHub:

Error: SHELLCHECK_WARNING ([CWE-398](https://cwe.mitre.org/data/definitions/398.html)): [[#def1]](https://openscanhub.fedoraproject.org/task/21458/log/added.html#def1)
/usr/share/buildah/test/system/helpers.bash:211:24: warning[[SC2115](https://github.com/koalaman/shellcheck/wiki/SC2115)]: Use "${var:?}" to ensure this never expands to / .
#  209|               else
#  210|                   # Failed. Clean up, so we don't leave incomplete remnants
#  211|->                 rm -fr $_BUILDAH_IMAGE_CACHEDIR/$fname
#  212|               fi
#  213|

It should be fixed before merging this pull request.

edsantiago · 2024-11-07T17:10:38Z

Got it with debug. In-page search for 10b119 demonstrates the flake.

edsantiago · 2024-11-07T23:30:53Z

another one

mtrmac · 2024-11-12T17:55:49Z

Got it with debug. In-page search for 10b119 demonstrates the flake.

Note to self:
First:

[+2477s] # time="2024-11-07T16:25:26Z" level=debug msg="Ignoring BlobInfoCache record of digest \"sha256:e4216d41a8ad46a3b75f9cdc2f40cab3cd56837931d4e73be4521a9537d7cde1\", uncompressed format does not match

OK. Then:

[+2477s] # time="2024-11-07T16:26:03Z" level=debug msg="Ignoring BlobInfoCache record of digest \"sha256:e4216d41a8ad46a3b75f9cdc2f40cab3cd56837931d4e73be4521a9537d7cde1\" with unknown compression"

How did we lose the compression format knowledge??

That’s not the immediate cause, but it suggests something unexpected is happening. Both have Using SQLite blob info cache at /var/lib/containers/cache/blob-info-cache-v1.sqlite.

edsantiago · 2024-11-13T13:15:28Z

The test image is created via

FROM scratch
COPY /therecanbeonly1 /uniquefile     (thecanbeonly1 contains date + random content)

It is the same image used for each test iteration (push --compression-format gzip, zstd --force-compression=false, zstd, and zstd --force-compression).

One approach I've considered but not tried: build a new image on each iteration. My gut tells me that this might get tests to pass, but is not necessarily the right thing to do. It depends on whether this issue is a real one that we might be sweeping under the rug.

The _prefetch helper, introduced in containers#2036, is not parallel-safe: two or more parallel jobs fetching the same image can step on each other and produce garbage images. Although we still can't run buildah tests in parallel (see containers#5552), we can at least set up the scaffolding for that to happen. This commit reworks _prefetch() such that the image work is wrapped inside flock. It has been working fine for months in containers#5552, and is IMO safe for production. This can then make it much easier to flip the parallelization switch once the final zstd bug is squashed. Signed-off-by: Ed Santiago <[email protected]>

All bats tests run with custom root/runroot, so it should be possible to parallelize them. Signed-off-by: Ed Santiago <[email protected]>

Signed-off-by: Ed Santiago <[email protected]>

edsantiago

This is about as clean as I can leave this PR. Good luck!

edsantiago · 2024-11-22T16:52:36Z

tests/bud.bats

@@ -165,6 +165,7 @@ _EOF
  # Helper function. push our image with the given options, and run skopeo inspect
  function _test_buildah_push() {
    run_buildah push \
+                --log-level=debug \


FIXME! Remove this. It is only present in order to debug the zstd flake.

edsantiago · 2024-11-22T16:52:47Z

tests/mkcw.bats

@@ -22,6 +22,7 @@ function mkcw_check_image() {

  # Decrypt, mount, and take a look around.
  uuid=$(cryptsetup luksUUID "$mountpoint"/disk.img)
+  echo "# uuid=$uuid" >&3


This isn't needed either.

edsantiago marked this pull request as draft May 29, 2024 17:45

openshift-ci bot added the do-not-merge/work-in-progress label May 29, 2024

edsantiago force-pushed the bats-parallel branch 2 times, most recently from d118eb9 to 4a543fd Compare May 30, 2024 21:03

edsantiago mentioned this pull request May 30, 2024

weird seccomp caching bug, causes bud-runtime-flag test to flake containers/crun#1475

Closed

edsantiago force-pushed the bats-parallel branch from 4a543fd to 1ea9823 Compare June 3, 2024 18:41

edsantiago force-pushed the bats-parallel branch 2 times, most recently from 762e878 to a195865 Compare June 10, 2024 12:30

edsantiago force-pushed the bats-parallel branch from a195865 to d05bc18 Compare June 11, 2024 11:03

edsantiago force-pushed the bats-parallel branch from d05bc18 to 053354d Compare June 18, 2024 21:09

edsantiago force-pushed the bats-parallel branch 2 times, most recently from 4d4e9c9 to e653a16 Compare June 27, 2024 14:00

Luap99 reviewed Jul 4, 2024

View reviewed changes

edsantiago force-pushed the bats-parallel branch from e653a16 to 31b2ab6 Compare July 22, 2024 11:59

edsantiago mentioned this pull request Aug 8, 2024

race: parallel builds: copying...committing...creating... layer not known #5674

Open

edsantiago force-pushed the bats-parallel branch from b01284c to aed1bc5 Compare August 20, 2024 13:25

github-actions bot added the stale-pr label Sep 20, 2024

edsantiago mentioned this pull request Oct 10, 2024

writing blob: Size mismatch containers/skopeo#2437

Closed

edsantiago force-pushed the bats-parallel branch 5 times, most recently from d17cb1b to 68722ca Compare November 4, 2024 13:35

edsantiago force-pushed the bats-parallel branch from 68722ca to 55a1cdd Compare November 7, 2024 13:11

edsantiago force-pushed the bats-parallel branch from 3fb8b6f to 5ff53d1 Compare November 7, 2024 15:29

edsantiago force-pushed the bats-parallel branch from 5ff53d1 to aa0f9bb Compare November 7, 2024 21:48

edsantiago force-pushed the bats-parallel branch from aa0f9bb to 0ef22be Compare November 11, 2024 12:44

edsantiago mentioned this pull request Nov 18, 2024

Tests: make _prefetch() parallel-safe #5841

Merged

edsantiago added 4 commits November 21, 2024 20:28

bats tests - parallelize

656eba1

All bats tests run with custom root/runroot, so it should be possible to parallelize them. Signed-off-by: Ed Santiago <[email protected]>

cleanup, debug, and disable parallel in blobcache tests

d4c48a9

Signed-off-by: Ed Santiago <[email protected]>

bump CI VMs to 4 CPUs (was: 2) for integration tests

6254755

Signed-off-by: Ed Santiago <[email protected]>

TEMPORARY: push zstd with log-level=debug

b380e9a

Signed-off-by: Ed Santiago <[email protected]>

edsantiago force-pushed the bats-parallel branch from 0ef22be to b380e9a Compare November 22, 2024 03:29

edsantiago commented Nov 22, 2024

View reviewed changes

edsantiago mentioned this pull request Nov 22, 2024

DO NOT MERGE: buildah vendor treadmill containers/podman#13808

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bats tests - parallelize #5552

bats tests - parallelize #5552

edsantiago commented May 29, 2024

openshift-ci bot commented May 29, 2024

packit-as-a-service bot commented Jun 3, 2024

Luap99 left a comment

Luap99 Jul 4, 2024

github-actions bot commented Sep 20, 2024

mtrmac commented Oct 10, 2024 •

edited

Loading

mtrmac commented Oct 10, 2024

edsantiago commented Oct 10, 2024

edsantiago commented Nov 4, 2024

mtrmac commented Nov 5, 2024

siteshwar commented Nov 7, 2024

edsantiago commented Nov 7, 2024

edsantiago commented Nov 7, 2024

mtrmac commented Nov 12, 2024

edsantiago commented Nov 13, 2024

edsantiago left a comment

edsantiago Nov 22, 2024

edsantiago Nov 22, 2024

bats tests - parallelize #5552

Are you sure you want to change the base?

bats tests - parallelize #5552

Conversation

edsantiago commented May 29, 2024

openshift-ci bot commented May 29, 2024

packit-as-a-service bot commented Jun 3, 2024

Luap99 left a comment

Choose a reason for hiding this comment

Luap99 Jul 4, 2024

Choose a reason for hiding this comment

github-actions bot commented Sep 20, 2024

mtrmac commented Oct 10, 2024 • edited Loading

mtrmac commented Oct 10, 2024

edsantiago commented Oct 10, 2024

edsantiago commented Nov 4, 2024

mtrmac commented Nov 5, 2024

siteshwar commented Nov 7, 2024

edsantiago commented Nov 7, 2024

edsantiago commented Nov 7, 2024

mtrmac commented Nov 12, 2024

edsantiago commented Nov 13, 2024

edsantiago left a comment

Choose a reason for hiding this comment

edsantiago Nov 22, 2024

Choose a reason for hiding this comment

edsantiago Nov 22, 2024

Choose a reason for hiding this comment

mtrmac commented Oct 10, 2024 •

edited

Loading