Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OCI registries. #323

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

atgreen
Copy link

@atgreen atgreen commented Jul 21, 2024

This patch introduces the ability to read repo contents from OCI registries, like ghcr.io, using the new 'oci' protocol. User can specify this protocol in their DNF .repo files as shown below:

[oci-test]
name=OCI Test
baseurl=oci://ghcr.io/atgreen/librepo/gh-cli
enabled=1
gpgcheck=1

To set up the server-side repository, create a public package repository in github, and populate it by pushing the repo file contents using the ORAS cli tool.

createrepo .
FILES=$(find . -type f | sed 's|^\./||')
for FILE in $FILES; do
    oras push ghcr.io/atgreen/librepo/gh-cli/$FILE:latest $FILE
done

Currently, only public repositories are supported. To support private package repositories, a bearer token is required. Implementing this would necessitate changes to libdnf to allow for a bearer_token configuration option in .repo files.

= changelog =
msg: Add support for OCI registries
type: enhancement

This patch introduces the ability to read repo contents from OCI
registries, like ghcr.io, using the new 'oci' protocol. User can
specify this protocol in their DNF .repo files as shown below:

[oci-test]
name=OCI Test
baseurl=oci://ghcr.io/atgreen/librepo/gh-cli
enabled=1
gpgcheck=1

To set up the server-side repository, create a public package
repository in github, and populate it by pushing the repo file
contents using the ORAS cli tool.

createrepo .
FILES=$(find . -type f | sed 's|^\./||')
for FILE in $FILES; do
    oras push ghcr.io/atgreen/librepo/gh-cli/$FILE:latest $FILE
done

Currently, only public repositories are supported. To support private
package repositories, a bearer token is required. Implementing this
would necessitate changes to libdnf to allow for a bearer_token
configuration option in .repo files.

= changelog =
msg:           Add support for OCI registries
type:          enhancement
@cgwalters
Copy link
Contributor

Thanks for starting this!

A lot of prior discussion in e.g.

And probably other places.

I only skimmed the code but I think there's slightly more to OCI than that, unless I'm missing something?

There is an implementation of talking to OCI registries that lives in flatpak today https://github.com/flatpak/flatpak/blob/main/common/flatpak-oci-registry.c and it would clearly make sense to share code.

That said...IMO we would gain the most value by using the github.com/containers/image library - we can do this via the skopeo experimental-image-proxy which is designed as a language-independent RPC API to that library.

(Also a few other notes; while just stuffing the existing XML in a registry is clearly the shortest path, since we're changing the protocol IMO we could consider just replacing XML with JSON or so too, and maybe even do some other cleanups and simplifications. OTOH that would make migration a bit harder)

@atgreen
Copy link
Author

atgreen commented Jul 22, 2024

I only skimmed the code but I think there's slightly more to OCI than that, unless I'm missing something?

I don't think so. Every file is pushed as a single layer/blob, and you just ignore version tagging (tag everything with 'latest'). To pull a file you grab the manifest json, find the hash of the first/only layer, pull that down and checksum it. This approximates what you get with http-hosted repos.

(Also a few other notes; while just stuffing the existing XML in a registry is clearly the shortest path, since we're changing the protocol IMO we could consider just replacing XML with JSON or so too, and maybe even do some other cleanups and simplifications. OTOH that would make migration a bit harder)

That's an orthogonal issue and doesn't have to be tied to OCI support. My patch is very small, and introduces a nice new capability.

Thank you for considering this change!

@Conan-Kudo
Copy link
Member

What specifically do you hope to gain with oci:// support? OCI registries aren't really a good place to store RPM repositories.

@Conan-Kudo Conan-Kudo self-assigned this Aug 24, 2024
@Conan-Kudo
Copy link
Member

As I think through this, I'm trying to understand how the contents of a repository would be represented in an OCI registry? I can't think of a way this would be efficient and effective.

If it's done as each file is a layer in an "image" blob: then it becomes massively expensive to fetch individual files. If it is each file is an image "blob", it results in disconnected spew that's difficult to store and replicate.

@Conan-Kudo Conan-Kudo added blocked RFE Request For Enhancement (as opposed to a bug) labels Aug 24, 2024
@atgreen
Copy link
Author

atgreen commented Aug 24, 2024

What specifically do you hope to gain with oci:// support? OCI registries aren't really a good place to store RPM repositories.

OCI registries are increasingly being used as general-purpose artifact storage. For instance, Homebrew uses OCI registries for all of its packages and related artifacts. Just ignore the layers. The repo structure can look just like it does on a web server. I was able to mirror my repos into ghcr.io, and it works just the same, except that now I don't have to maintain hosting infrastructure. The OCI ecosystem also has mature tooling to help manage/mirror them.

Not worrying about hosting infrastructure or bandwidth costs is a huge win. There are many instantly-available and free OCI hosting options out there. If you consider Enterprise use cases, companies are increasingly more competent at internal hosting of OCI registries thanks to container adoption. Sharing RPMs through an enterprise OCI registry will be easier for many users than trying to deploy/maintain a web server or convince internal Satellite maintainers to host their content.

@atgreen
Copy link
Author

atgreen commented Aug 24, 2024

If it's done as each file is a layer in an "image" blob: then it becomes massively expensive to fetch individual files. If it is each file is an image "blob", it results in disconnected spew that's difficult to store and replicate.

Yes, my implementation ignores layers. Each file is a blob, but they are structured in the OCI registry exactly as they would appear on a web server.

@rohanpm
Copy link

rohanpm commented Sep 3, 2024

I'm also interested in the topic of how an RPM repository might be mapped into an OCI registry. I see a few challenges with the current approach though.

The proposed implementation right now creates one new OCI repository for every file in the RPM repository. For example, in the ghcr.io registry, atgreen/librepo/gh-cli/repodata/3c15e31fd86f4e2082f66922a69d39489a5737a1cf6a3d937aaefc898ba8e75d-primary.xml.zst is one repository, atgreen/librepo/gh-cli/repodata/repomd.xml is another and so on.

If you push an RPM repository with 1000 files, you'll end up with 1000 OCI repositories which is probably not acceptable on many OCI registry implementations.

I also think using the names of files in the RPM repo as OCI repository names will run into issues. Per the spec at https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pulling-manifests , repository names need to match a certain regular expression. It definitely doesn't match all legal RPM filenames, for example it only allows lowercase letters.

@dralley
Copy link
Contributor

dralley commented Sep 4, 2024

I would be much more interested in investigating whether a more "native" approach is feasible (or reasonable) for RPMs. That means - no repomd.xml, no primary.xml, etc.

Trying to put an RPM repository directly into an OCI registry seems a bit nuts to me.

@cgwalters
Copy link
Contributor

I would be much more interested in investigating whether a more "native" approach is feasible (or reasonable) for RPMs. That means - no repomd.xml, no primary.xml, etc.

It's a question of how deep one wants to go down the rabbit hole. I would agree that it seems quite obvious to kill primary.xml - it's basically a "list of checksums for external objects with a timestamp" and OCI covers that in a standard way. Also, JSON instead of XML alone is a win. But now I'm just repeating #323 (comment)

So I agree with you at a high level but...

Trying to put an RPM repository directly into an OCI registry seems a bit nuts to me.

The practical problem here is that doing anything else would require API changes in both librepo and its consumers I think. In a world where we store RPMs as OCI natively I am sure there'd be a need for a tool to "bridge" the two formats and synthesize the legacy format from the OCI one. And we'd need to be realistic about having to care about the rpm-md format for many years.

Maybe in theory librepo could translate a "rpm-md-oci" layout back into primary.xml client side in some cases?

@dralley
Copy link
Contributor

dralley commented Sep 16, 2024

I guess I just don't understand what exactly is the benefit of shoving an RPM repository into an OCI artifact registry as-is, as opposed to serving it from HTTP.

I see Neal asked the same and I see @atgreen's response, but the whole argument seems to be an economic one (exploiting the fact that some companies provide free OCI registry hosting) rather than a technical one. Is that a good enough reason to add more complexity to the RPM ecosystem or are there other reasons?

Red Hat is also working on the whole repo-hosting-as-a-service service, and there's other third parties that have done the same for a while (packagecloud.io, etc.) and even free services like COPR, so I see this as just adding another way of doing something that can already be done moreso than opening up a whole new usecase or even business case.

The practical problem here is that doing anything else would require API changes in both librepo and its consumers I think. In a world where we store RPMs as OCI natively I am sure there'd be a need for a tool to "bridge" the two formats and synthesize the legacy format from the OCI one.

Ok, but it kinda feels like "API changes in both librepo and its consumers" would be appropriate for this type of change?

So the question is, is the marginal utility significant enough to add a new approach that feels slightly half baked in addition to the old way that we would need to support for nigh-eternity, and also whatever new approach gets cooked up in 4 years :)

And we'd need to be realistic about having to care about the rpm-md format for many years.

Sure, definitely agree with that

@cgwalters
Copy link
Contributor

I guess I just don't understand what exactly is the benefit of shoving an RPM repository into an OCI artifact registry as-is, as opposed to serving it from HTTP.

See the "benefits" section in my original issue which was already linked above coreos/rpm-ostree#4155

@dmesser
Copy link

dmesser commented Oct 11, 2024

+1 to @cgwalters points. I would add:

  • standardized way of providing and discovering provenance information to RPMs via OCI Referrers (beyond signatures also SBOMs, build attestations, etc), another example I can think of are source rpms
  • backend storage deduplication when storing the same rpm in multiple repositories/namespace of the OCI registry
  • rich automation workflows, when the rpm version is reflected in the OCI artifact's tag name, for registries that support lifecycle automation based on tag patterns (e.g. remove nightlies after 30 days, or some such)
  • out of the box authentication and authorization found in most registry implementations
  • easy implementation of a rpm pull through cache with most registry implementations out there, permanent mirroring possible with simple tools like skopeo

Some of these advantages are more obvious when we talk about a self-hosted DNF repo on an httpd server rather than on infrastructure like copr or projects like katello. But OCI registries, as outlined by earlier comments, are ubiquitous these days and easy to leverage for a FOSS project

@Conan-Kudo
Copy link
Member

I do not want to have this feature if the primary goal is to exploit and exhaust public OCI registries.

@dmesser
Copy link

dmesser commented Oct 11, 2024

@Conan-Kudo I don't know that this is the primary goal, but as an owner of a very large public OCI registry service, I can tell you that I am much more concerned about other artifact types and actors :)

@cgwalters
Copy link
Contributor

There's actually 3 levels of this, sorted in increasing levels of effort+reward:

  • Storing literally the file formats that exist today (primary.xml, foo.rpm) as OCI artifacts, requiring minimal changes to the client side, and it's relatively straightforward to convert to/from the registry
  • Replacing primary.xml with a manifest that points to the RPMs
  • Storing RPMs unpacked as .tar.zstd layers, and the RPM header as annotations inside the manifest (or maybe the config?)

The immense value of the 3rd path is it's much easier to intersect the world of RPMs and OCI container images directly - for example it would "just work" to skopeo copy docker://quay.io/fedora/fedora-rpms:kernel oci:foo and get the unpacked RPM representation.

But it'd also mean changes to RPM to accept something that looks like an OCI directly as input - and actually kind of hard require switching the way signatures are done to be OCI signatures (instead of its current inline GPG stuff).

Cost - and benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked RFE Request For Enhancement (as opposed to a bug)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants