Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore and define the CI stability epic #1070

Closed
orfeas-k opened this issue Sep 17, 2024 · 3 comments
Closed

Explore and define the CI stability epic #1070

orfeas-k opened this issue Sep 17, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@orfeas-k
Copy link
Contributor

Context

Work for https://warthogs.atlassian.net/browse/KF-6217

What needs to get done

Explore and provide insight to define the epic

Definition of Done

Exploration gives enough information to define the epic and provide estimations

@orfeas-k orfeas-k added the enhancement New feature or request label Sep 17, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6267.

This message was autogenerated

@orfeas-k
Copy link
Contributor Author

orfeas-k commented Sep 20, 2024

Description

The initial description of the epic summarized is:

  1. Tackle issues created by building charms concurrently in one runner.
  2. Artefacts consistency - Publish the same charm artifacts that are used for testing
  3. Speed up the CI by eliminating build time
  4. (stretch) CI health visibility - Create a dashboard for exposing current CI status

Points 1,2 and 3 would be tackled if we implemented the following:

  1. Build charm(s) in a separate GH runner (tackles 1)
    1. On pull requests, use cache for building (tackles 3)
  2. After building, upload the artifact making it accessible to other jobs in the CI in order to:
    1. Download and use in tests (tackles 2)
    2. Download and use in publish job (tackles 2)

PoC

Two PoC draft PRs were created in order to test how our CI would look like using the Data Platform workflows and investigate any potential limitations.

Single-charm repository

canonical/namespace-node-affinity-operator#47

Results

Implementation notes

  • We can probably remove the charm-python-packages from our charms since charmcraftcache has no awareness of it and charmcraft appears that it's installing those already by default.

  • In order to ensure caching will work every time, it is recommended that we use charm-strict-dependencies deps & pin all our pydeps (which charm-python-packages doesn't work with, hence why ccc doesn't have awareness of it).

  • Redundant dependency where publish job during on_pull_request waits for the whole intergrate.yaml to succeed:
    DP team mentioned that keeping as low level of nested jobs makes for easier debugging, simpler CI. To workaround the above dependency, we could refactor on_push, on_pull_request, integrate and publish.yaml:

    • Move logic from integrate.yaml to on_pull_request. Since now charm is built in a job inside integrate.yaml, current structure will mean that publish job will need for the whole integrate.yaml to complete before starting to release. If we move the logic directly inside the on_pull_request, then this will:
      • Call directly reusable workflow release_charm.yaml itself, only when trigger is pull_request (example)
      • Move the logic that defines charmhub destination channel from publish.yaml to on_pull_request and create an output for it as well
      • Propagate the output of the build_charm.yaml workflow
    • on_push calls the updated on_pull_request. Once this succeeds (since we now need to know that tests were executed before publishing to version/edge), on_push will use the two outputs of on_pull_request (artifact-registry and destination-channel) and call the reusable workflow release_charm.yaml itself. Meanwhile, since this is not a PR, it will not be called by on_pull_request.

    On another note, we could also move all the logic inside one workflow and since the release_charm.yaml dependencies are defined dynamically, according to pull_request or push events, have two separate jobs with different needs fields which run according to mutually exclusive ifs (issue).

Multi-charm repository

canonical/kfp-operators#571. Note that this PR hasn't refactored the publish job (doesn't implement 2.2 from above).

Results

  • Again, all charms were built in 2 minutes.
  • Bundle integration tests, after waiting 2 minutes for all charms to build, started deploying the bundle right away

Multi-charm specific limitations

  • Right now, individual integration tests wait for all charms to be built instead of only that specific charm. This isn't too much of a problem though since with charmcraftcache, all charms will take around the same time when using cache.
  • On the left on GH Actions UI, you cannnot distinguish by name which charm is the build charm job for.
  • Github Action matrices work in a way where you cannot have job X run for A and B and then have a second job Y run for A,B where Y(A) depends on X(A) and Y(B) depends on X(B). Thus, we cannot schedule integrate.yaml for all charms and then have publish get input from the integrate.yaml. This limitation can be worked around:
    • Using the hardcoded value to download the packed charm (this works and DP team doesn't think this is prone to change in the future)
    • Nesting all jobs under a larger one and doing the matrix work only on the highest level. This is simlar to what was described in Implementation notes section above - bullet 3.

@orfeas-k
Copy link
Contributor Author

The exploration for this has been completed and tasks have been proposed to tackle this epic so I 'm closing this issue. The epic hasn't been defined completely yet since it turns out that we 'll need to sync with other teams and maybe modify its scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant