Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading SBOMs on the image attached by buildpacks #520

Open
sambhav opened this issue Dec 5, 2021 · 6 comments
Open

Add support for reading SBOMs on the image attached by buildpacks #520

sambhav opened this issue Dec 5, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@sambhav
Copy link
Contributor

sambhav commented Dec 5, 2021

What would you like to be added:

Buildpacks is a CNCF project that can create secure and minimal images from source code. It currently has a well-spec'd way of attaching SBOMs to the output image. Grype should use this information to load the attached image SBOMs for vuln. matching. See https://github.com/buildpacks/spec/blob/main/buildpack.md#software-bill-of-materials for the spec.

Why is this needed:

Buildpacks allow SBOM generation at build time. This leads to more accurate SBOMs generated by the same process that created the software artifact in the first place. We can use this information to create more accurate vuln reports.

Additional context:

Buildpacks currently support Syft, CycloneDx and SPDX SBOM formats. They also store an SBOM per layer. The SBOM blob is stored as a separate identifiable layer, so we don't even need to download the entire image to fetch the attached SBOMs.

Also related to #519, #481, #395, anchore/syft#631, anchore/syft#612 , buildpacks/spec#257, buildpacks/rfcs#195

@sambhav sambhav added the enhancement New feature or request label Dec 5, 2021
@wagoodman
Copy link
Contributor

Interesting! Today's path for getting SBOM information is from syft (either by cataloging on the spot, or using syft to decode SBOM documents for grype's purposes). Since all of the formats in question are already supported in terms of our decoding ability (or will soon be supported) this seems to be more of a "how do we request and handle transport of the SBOM bytes" first. This hints at adding code to grype that can handle the request/transport first and present that to syfts decoders (and not to syft).

We use schemes to tell grype exactly how to deal with a specific input. We could add a new scheme like:

grype buildpack-sbom:my/image:latest

(TBD on the scheme name, up to other suggestions... embedded-sbom maybe?)

To instruct grype to surgically pull down only the SBOM layers (if the image isn't already local) and then pass the payloads into syft. We tend to push down much of the image parsing logic down into stereoscope, and it may be the case that much of the layer fetching and such would belong here (and not directly in grype).

From a high level I think this would work great. From the next level of detail down there are some things that would still need to be figured out:

  • deduplicating packages that are "found" in multiple layers (e.g. Dockerfiles with multiple yum install statements in different layers duplicate the RPM database, which makes it look like there are multiple packages when in fact there is not)
  • I think it makes sense that grype output should show (for packages that have vulnerabilities) where they were found in the image. When getting this info from SBOMs I think there may be a need for a little finagling to get the same info displayed in the same way, but it should be possible.

<distraction>
When given an image syft has the ability to look at the squashed representation of that image (the default behavior) or look at all layers in the image (-s all-layers) so that anything that shows up in a layer always shows up in the final results (regardless if it doesn't show up in the squashed representation). It would be interesting to do something like syft <image> --attach to attach todays SBOM to an image... or something like --attach each-layer to generate an SBOM for each layer the same way that buildpacks does. 🤔
</distraction>

@sambhav
Copy link
Contributor Author

sambhav commented Dec 6, 2021

@wagoodman thanks for the detailed feedback! I am assuming we would require something similar for #519 where the SBOM is attached in the cosign format and introduce grype cosgin-sbom:my/image:latest ?

Alternatively, we could also possibly auto-detect if an image has cosign/buildpacks sbom based on image metadata. Potentially this auto-detect behavior can be guarded by a flag on the CLI/grype config file?

Separately - is there any slack channel/IRC/preferred way of communication for grype/syft or would you prefer discussions/questions on the implementation for this in a draft PR or this issue?

@wagoodman
Copy link
Contributor

I am assuming we would require something similar for #519 where the SBOM is attached in the cosign format and introduce grype cosgin-sbom:my/image:latest ?

I believe so, yup. I might take a pulse on the various ways SBOMs are intersecting with all-things-OCI to see what makes sense from a naming-perspective for all of these new schemes.

Alternatively, we could also possibly auto-detect if an image has cosign/buildpacks sbom based on image metadata. Potentially this auto-detect behavior can be guarded by a flag on the CLI/grype config file?

We do have an auto-detect execution path if no scheme is given (so this wouldn't be an alternative per se). This could be amended to include cosign/buildpacks, but I think that's dependent on the specifics of the behavior here and if it works in the "default path", so probably TBD after the initial scheme work.

is there any slack channel/IRC/preferred way of communication for grype/syft or would you prefer discussions/questions on the implementation for this in a draft PR or this issue?

Chatting here on the issue, in our #general slack channel, collaborating on a draft PR directly, or joining on a zoom chat at our community meetings all work well --whichever you prefer works well for us 👍 . (side note: our first community meeting is this Thursday 🎉 , so the agenda is wide open )

@sambhav
Copy link
Contributor Author

sambhav commented Dec 9, 2021

Thanks, I will try to join the community meeting tomorrow and introduce myself and this issue to the wider community :)

@sambhav
Copy link
Contributor Author

sambhav commented Dec 11, 2021

@wagoodman I have been looking into the source code for stereoscope and I have a few thoughts -

  • Currently stereoscope is responsible for fetching a container image from various sources and each of these is denoted by a scheme which is one of: docker-archive, docker, oci-dir, oci-archive, oci-registry, registry
  • A buildpack related image can be stored in any of the above formats.

Given the above, would it instead be useful to have a flag (--buildpacks or --cnb) or an additional identifier to the scheme (something like docker-archive+cnb, docker+cnb, registry+cnb and so on)? I am hoping we can use the scheme of type <ggcr-source>+<sbom-attachment-type> to create a generic scheme that denotes different sbom transport mechanisms.

This brings me to the following points -

  1. The presence or absence of a buildpacks sbom can be determined by looking at the image config and checking for the presence of certain labels
  2. By the time we have an image config in stereoscope - it might already be too late to do automated scheme detection since currently the scheme detection is majorly used to determine the origin source of the image
  3. Since the buildpacks sbom is applicable for all of the ggcr sources, it might be better to have the buildpack sbom detection logic somewhere in the image.Read() instead? In this case, if it detects buildpacks metadata, it may choose to skip reading in other layers and just load the buildpack layers and return the image.
  4. In case the user explicitly passes something like registry+cnb we can choose to fail if we don't find any buildpack related labels
  5. Once we have a stereoscope image reference we can continue parsing things in syft at https://github.com/anchore/syft/blob/ab9fe53ff2dca2bdcbe2559f1e3f09b8550fdf20/syft/source/source.go#L53 which will in turn bring us back to grype at
    src, cleanup, err := source.New(userInput, registryOptions)
    with the final syft.Source
  6. After that the syft.Source will be passed to syft.CatalogPackages where we could potentially have a SBOMDirectoryCataloger (this will be useful for both buildpacks and cosign at this point onwards). This can be responsible for taking in a directory of SBOMs, merging them in a way that preserves whatever information we want and outputting the final merged SBOM to grype with all the packages at
    return syftProvider(userInput, scopeOpt, registryOptions)
    This part of the changes would likely touch upon the SBOM merging issues that we have on syft.
  7. After that we have gathered all the packages and grype can continue as normal from
    allMatches := grype.FindVulnerabilitiesForPackage(provider, context.Distro, packages...)

I would imagine the series of steps of cosign would converge after step 5.

I still have to give some thought around the interfaces we can use to denote an SBOM image attachment (whether through buildpacks or cosign or in the future through OCI attachments) and how to mark that as a collection of sbom files + metadata that can be passed back to syft to merge/generate packages from. And finally some missing pieces on how to create a generic interface for sbom merging in syft to handle merging in all of these cases.

Does the above proposal look sane?

@spiffcs spiffcs added this to OSS Jun 1, 2022
@spiffcs spiffcs moved this to Triage (Comments or Progress Made) in OSS Jun 1, 2022
@spiffcs
Copy link
Contributor

spiffcs commented Aug 3, 2023

Soft bump on this - we're putting this back into the backlog right now for one of the team members to pick up when we have the cycles.

I really appreciate the work that has already been done on this:

Steps for the feature:

  • Identify that an SBOM has already been generated for the source and attached as part of the buildpacks process
  • Use that identified SBOM rather than generate a new one that has potentially has less fidelity than the already attached one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants