Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectivity issues when running preflight check container/operator test #1217

Open
ramperher opened this issue Nov 22, 2024 · 0 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ramperher
Copy link

ramperher commented Nov 22, 2024

Bug Description

Connectivity issues when running preflight check container/operator test

Version and Command Invocation

1.10.2

Steps to Reproduce:

(How can we reproduce this?)

  1. Run preflight check container/operator for example-cnf operators

Expected Result

(What did you expect to happen and why?)
All preflight check container tests must be passing, in fact the example-cnf operators are already developed to pass all preflight tests. Here we have one example with DCI.

Actual Result

(What actually happened)
When testing these example-cnf operators in our automation, it has been happening that, in some isolated cases, there are some preflight check container tests that are randomly failing, not always the same test and not always in the same operator. This has appeared in:

When checking the preflight logs, I always see network connectivity related errors; for example, from the ScorecardBasicSpecCheck case:

$ cat preflight_operator_trex-operator_operator_bundle_scorecard_BasicSpecCheck.json
{
  "kind": "TestList",
  "apiVersion": "scorecard.operatorframework.io/v1alpha3",
  "items": [
    {
      "kind": "Test",
      "apiVersion": "scorecard.operatorframework.io/v1alpha3",
      "spec": {
        "image": "quay.io/operator-framework/scorecard-test@sha256:b06d49edb2691de31366f11358db2c4ca1109bb6155ef64e2b1e3492bc78ae1d",
        "entrypoint": [
          "scorecard-test",
          "basic-check-spec"
        ],
        "labels": {
          "suite": "basic",
          "test": "basic-check-spec-test"
        },
        "storage": {
          "spec": {
            "mountPath": {}
          }
        }
      },
      "status": {
        "results": [
          {
            "state": "fail",
            "errors": [
              "Get \"https://192.168.62.25:10250/containerLogs/preflight-testing/scorecard-test-552g/scorecard-test\": write tcp 192.168.62.22:49650->192.168.62.25:10250: use of closed network connection"
            ],
            "creationTimestamp": null
          }
        ]
      }
    }
  ]
}

And for the BasedOnUbi case:

$ cat preflight_container_testpmd-operator_testpmd-operator_preflight.log
time="2024-11-20T05:14:09Z" level=debug msg="running check" check=BasedOnUbi
time="2024-11-20T05:14:39Z" level=info msg="check completed" check=BasedOnUbi err="unable to verify layer hashes: pyxis query for uncompressed top layers ids [\"sha256:86426b9e591db2cdd8eba8085aa38b705422152b49960696308d988f33f3d741\" \"sha256:f3d283f88ee8b4074e070fab5e6c04f48f7a2fe0c362565ab59cefa776071523\" \"sha256:9bf80315bb8c54b069936ba8d8e9805af543ae497b1f6110acbcf68eef9ca564\" \"sha256:ee83e2144171d0b886c4c559ad04daed0e8517035522d4ec765f9d8cdae65d8b\" \"sha256:96d595eeb8c25301df76ae95ab01b1c403edc2b699698b79acefdcdd982d6c33\" \"sha256:92dcd488b6e5fbaaa92c9fcce0940df1c286521641dbf8e51422216e497a03c5\" \"sha256:e07c76127eb15a16a2a89ca0d55116afb0a40640a582e20f1acbd94c83a9f3fa\" \"sha256:27b33da2a7ded13c595c0151fc91452673444adb9a23616065758f8888dcbd0c\" \"sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef\" \"sha256:d5fb4da999ce7743acd25c4992cc67135fc7d6c6fb4244c4e9c452c9953cfc7c\" \"sha256:9b0808b0946410c7bd00e1105b2f7a2cf0e7ba461eb1577be2bf9432a3d65bd3\" \"sha256:7dcc789228d63d1ef35f6f0de316587b54f426b3b576223e88d6807a5fa93c9c\" \"sha256:160348bb6297250db55ad7d87899200934ef49557d2ecde5dc65613b588a551a\" \"sha256:e5acddee1e31c4f124ca18ebfd4c9f40cc153deb6150046d92aad9e07bb4b41c\" \"sha256:3893a90ebc5288b9e7ba058060313e5ea97a7568e041f8effddb0640aeea2e07\" \"sha256:af6a2043b105e29dcc31f6d945a8d0c60251c4260db26215769d4b22ae566f97\" \"sha256:00be9443ddf138bae11085d0f61032486c697d7df1f31717179c28f0a0183060\"] failed: error while executing layers query: Post \"https://catalog.redhat.com/api/containers/graphql/\": dial tcp 23.212.185.102:443: i/o timeout" result=ERROR

As I said, this is something that is not happening all the times; in fact, this may happen and, if we retry preflight tests afterwards, it's passing normally. But it's something that we cannot really control because it happens in isolated cases in our automation.

We are wondering if there's any way of adding retries on these tests, so that this could potentially avoid these isolated network issues that may happen from time to time.

Additional Context

(Anything else you think might help us troubleshoot, like your platform, dependency versions, etc).
Using DCI for running the tests.
We have a Jira card with more documentation if you need it.

@ramperher ramperher added the kind/bug Categorizes issue or PR as related to a bug. label Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant