Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a re-usable workflow for oci-image scans #69

Closed
DnPlas opened this issue Sep 25, 2024 · 4 comments
Closed

Create a re-usable workflow for oci-image scans #69

DnPlas opened this issue Sep 25, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@DnPlas
Copy link
Contributor

DnPlas commented Sep 25, 2024

Context

As the team grows its offerings, security vulnerabilities must be scanned and report effectively so the team can addressed them in an appropriate time.
Currently, the only repository that has a Github workflow for scanning oci-images and getting reports is canonical/bundle-kubeflow using the scan-images.yaml workflow. While working correctly at the moment, this workflow presents the following limitations:

  1. Uses a local script to gather the images used in all charm repositories that form the bundle. At the same, the get-all-images.py script depends on scripts present in each repository to generate a list of images per repo. The problem with this is that 1) not all repos have this script (e.g. mlflow), 2) this script is tightly coupled to the host repo.
  2. The Scan images step of the workflow depends on two scripts located at canonical/kubeflow-ci. This is problematic because 1) it creates a maintenance task, 2) they are doing something that actions like aquasecurity/[email protected] are already providing.
  3. The workflow is not re-usable as is, meaning it cannot be used by mlflow-operator repository.

Proposal

Create a re-usable workflow for scanning oci-images that:

  1. Uses the aquasecurity/[email protected] to scan and generate reports for each of the images under scan
  2. Uploads each of the Trivy reports as artefacts of the Github workflow run
  3. Automatically report vulnerabilities via Github issues in the rock's repository (i.e. canonical/training-operator-rocks, canonical/kubeflow-rocks)
  4. If the vulnerability for a specific image has already been reported, the workflow is smart enough to update an existing issue with the latest details of the report.
  5. Runs on schedule, but also provides a workflow dispatch

Please NOTE that part of this proposal is to only scan images that the Analytics team maintains. This is because the images that charms use that come from upstream cannot be patched by us.

Limitations

1. There is no other way of fetching the images that each charm uses, so for now we'll stick to using the get-all-images.py script.
2. The Trivy reports will be uploaded individually, meaning that there is a linear relation between the number of scanned images and the number of artefacts saved in the workflow run.
3. BIGGEST This workflow will be coupling the product to the rocks, that is, the scans are done far from the source code. Ideally we'd have scans and vulnerability reports at the rockcraft project repositories. For this one, though, we could plan to push rocks to the oci-factory and outsource all the vulnerability scans and reports. The workflows will live at rocks repo level, so this is not a limitation anymore.

Out of scope

  1. Automatic notifications in mailing list or MM
  2. snap or charm scans - though common workflows can be used in other automations, for example, for creating GH issues.

Example

  1. Scanning an image - this is an example run. The vulnerability scan job will fail if it founds a CRITICAL or HIGH vulnerability and it will report an issue.
  2. The workflow - this is how the workflow would look like, just with a bit of work to make it 100% product agnostic.
  3. Automatic issue creation - this is an example of an issue that will be created automatically by the workflow. It currently uses my GH token, that's why I'm the reporter, but ideally we'll use the CKF bot for it.

What needs to get done

Create a re-usable workflow for getting images used by any rock, scanning them for vulnerabilities, and reporting found vulns following the example in https://github.com/canonical/bundle-kubeflow/pull/1087/files#diff-327280cbc65c9de9998db8b0e5d1c937ccf75524907e5f9d026304ca85146f53

Definition of Done

There is a re-usable workflow that any of the charming products of this team can use.

@DnPlas DnPlas added the enhancement New feature or request label Sep 25, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6331.

This message was autogenerated

@DnPlas
Copy link
Contributor Author

DnPlas commented Sep 26, 2024

Based on feedback from @misohu, the way to better approach this enhancement proposal is to have the scans closer to the source (each rock repository) instead of a central place.
@misohu also pointed out that rocks are already being scanned on_push and on_pull by the canonical/charmed-kubeflow-workflows/.github/workflows/get-rocks-modified-and-build-scan-test-publish.yaml@main workflow, so scans are already happening at the rock level, but vulnerabilities are not being reported and not being constantly tested.
I am editing the original proposal in the description of this issue to match the above.

DnPlas added a commit that referenced this issue Oct 8, 2024
Add the option to report vulnerabilities automatically via Github issues.

Fixes #69
DnPlas added a commit that referenced this issue Oct 8, 2024
Add the option to report vulnerabilities automatically via Github issues.

Fixes #69
DnPlas added a commit that referenced this issue Oct 9, 2024
This re-usable workflow can be used for reporting security vulnerabilities
via Github issues. It takes the issue title, image-name, and issue-labels as
inputs, and in turn:
* edits an existing issue with the same title and updates the vulnerability report
* creates a new issue with the issue-title and adds the vulnerability report in the description

Please NOTE this workflow assumes the existence of vulnerability reports as artefacts
of a workflow run; that is, it expects artefacts named trivy-report-<image-name> to
be present in the sabe workflow run.

Part of #69
DnPlas added a commit that referenced this issue Oct 9, 2024
… reports

This commit adds get-published-images-scan-and-report.yaml, a re-usable workflow
that enables repositories to scan images from a public registry (in the case
of the Analytics team it defaults to charmedkubeflow) and reports back
the security vulnerabilities as Github issues.
This workflow is intended to be used on demand (using a workflow dispatch)
and on schedule, as it will be used for continuous testing of the published
images a rock repository generates.

Part of #69
@DnPlas
Copy link
Contributor Author

DnPlas commented Oct 9, 2024

Solutions

Based on the description of this issue and after working a bit on the solution, we have the following:

  1. A re-usable workflow for reporting vulnerabilities (as proposed in ci: add workflow to enable automatic vulnerability reports #72). This will generate labelled Github issues with detail descriptions of the vulnerabilities found during the scans.

This workflow is flexible enough to work with the current implementation of build-scan-test-publish-rock.yaml, an example of the required changes can be found here, and as a side effect, here.

This workflow can also be used for workflows triggered on_push, on_demand, and on workflow_dispatch. An example of a full implementation can be found in #73.

  1. A re-usable workflow for scanning and reporting. In ci: add re-usable workflow for scans from published img and automatic… #73 the end to end integration is done on a re-usable workflow that runs on workflow_dispatch and on schedule and in turn it scans the images from a public registry, uploads each vulnerability report, and at the same time creates/edits Github issues for easy tracking.

Discussions

A) Should automatic reports be enabled on_push? - Right now, most (if not all) rocks repositories are scanning images on_push and uploading the vulnerability reports, but:

  • Those workflows will not fail even if a vulnerability is found
  • The results of those scans are not monitored by the team

This can be solved by relying on the scheduled workflow, BUT, the scheduled workflow only scans and reports published images. On the other hand, not enabling this would ensure that the CI is always green and publishing images regardless of the vulnerabilities.

#74 shows an example of how this can be added and be left for us to enable it whenever we call get-rocks-modified-and-build-scan-test-publish.yaml in each of the rocks repositories. In this workflow run, the execution shows an example of the feature available, but disabled (as we are not passing the report-vulnerabilities: true to the workflow). On the other hand, this is an example run of the same workflow, but enabling the reports, as seen here. This would be what could happen on_push if we decide that it is worth adding this.

DnPlas added a commit that referenced this issue Oct 9, 2024
This commit enables the automatic creation of Github issues when a security
vulnerability is found in the scan jobs that the build-scan-test-publish-rock.yaml
already performs.
The intention of this is to add reporting capabilities to the workflows that
are already using build-scan-test-publish-rock.yaml on_merge, that is, enable automatic
reports of vulnerabilities as Github issues on every merge.

Part of #69
DnPlas added a commit that referenced this issue Oct 11, 2024
* ci: add workflow to enable automatic vulnerability reports

This re-usable workflow can be used for reporting security vulnerabilities
via Github issues. It takes the issue title, image-name, and issue-labels as
inputs, and in turn:
* edits an existing issue with the same title and updates the vulnerability report
* creates a new issue with the issue-title and adds the vulnerability report in the description

Please NOTE this workflow assumes the existence of vulnerability reports as artefacts
of a workflow run; that is, it expects artefacts named trivy-report-<image-name> to
be present in the sabe workflow run.

Part of #69
DnPlas added a commit that referenced this issue Oct 15, 2024
#73)

* ci: add re-usable workflow for scans from published img and automatic reports

This commit adds get-published-images-scan-and-report.yaml, a re-usable workflow
that enables repositories to scan images from a public registry (in the case
of the Analytics team it defaults to charmedkubeflow) and reports back
the security vulnerabilities as Github issues.
This workflow is intended to be used on demand (using a workflow dispatch)
and on schedule, as it will be used for continuous testing of the published
images a rock repository generates.

Part of #69
@DnPlas
Copy link
Contributor Author

DnPlas commented Oct 16, 2024

Will track the discussion about enabling the automatic reports in #82. Closing this issue as the re-usable workflow has been created and merged in #72 and #73.

@DnPlas DnPlas closed this as completed Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant