-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for OCI registries. #323
base: master
Are you sure you want to change the base?
Conversation
This patch introduces the ability to read repo contents from OCI registries, like ghcr.io, using the new 'oci' protocol. User can specify this protocol in their DNF .repo files as shown below: [oci-test] name=OCI Test baseurl=oci://ghcr.io/atgreen/librepo/gh-cli enabled=1 gpgcheck=1 To set up the server-side repository, create a public package repository in github, and populate it by pushing the repo file contents using the ORAS cli tool. createrepo . FILES=$(find . -type f | sed 's|^\./||') for FILE in $FILES; do oras push ghcr.io/atgreen/librepo/gh-cli/$FILE:latest $FILE done Currently, only public repositories are supported. To support private package repositories, a bearer token is required. Implementing this would necessitate changes to libdnf to allow for a bearer_token configuration option in .repo files. = changelog = msg: Add support for OCI registries type: enhancement
Thanks for starting this! A lot of prior discussion in e.g.
And probably other places. I only skimmed the code but I think there's slightly more to OCI than that, unless I'm missing something? There is an implementation of talking to OCI registries that lives in flatpak today https://github.com/flatpak/flatpak/blob/main/common/flatpak-oci-registry.c and it would clearly make sense to share code. That said...IMO we would gain the most value by using the github.com/containers/image library - we can do this via the (Also a few other notes; while just stuffing the existing XML in a registry is clearly the shortest path, since we're changing the protocol IMO we could consider just replacing XML with JSON or so too, and maybe even do some other cleanups and simplifications. OTOH that would make migration a bit harder) |
I don't think so. Every file is pushed as a single layer/blob, and you just ignore version tagging (tag everything with 'latest'). To pull a file you grab the manifest json, find the hash of the first/only layer, pull that down and checksum it. This approximates what you get with http-hosted repos.
That's an orthogonal issue and doesn't have to be tied to OCI support. My patch is very small, and introduces a nice new capability. Thank you for considering this change! |
What specifically do you hope to gain with |
As I think through this, I'm trying to understand how the contents of a repository would be represented in an OCI registry? I can't think of a way this would be efficient and effective. If it's done as each file is a layer in an "image" blob: then it becomes massively expensive to fetch individual files. If it is each file is an image "blob", it results in disconnected spew that's difficult to store and replicate. |
OCI registries are increasingly being used as general-purpose artifact storage. For instance, Homebrew uses OCI registries for all of its packages and related artifacts. Just ignore the layers. The repo structure can look just like it does on a web server. I was able to mirror my repos into ghcr.io, and it works just the same, except that now I don't have to maintain hosting infrastructure. The OCI ecosystem also has mature tooling to help manage/mirror them. Not worrying about hosting infrastructure or bandwidth costs is a huge win. There are many instantly-available and free OCI hosting options out there. If you consider Enterprise use cases, companies are increasingly more competent at internal hosting of OCI registries thanks to container adoption. Sharing RPMs through an enterprise OCI registry will be easier for many users than trying to deploy/maintain a web server or convince internal Satellite maintainers to host their content. |
Yes, my implementation ignores layers. Each file is a blob, but they are structured in the OCI registry exactly as they would appear on a web server. |
I'm also interested in the topic of how an RPM repository might be mapped into an OCI registry. I see a few challenges with the current approach though. The proposed implementation right now creates one new OCI repository for every file in the RPM repository. For example, in the ghcr.io registry, If you push an RPM repository with 1000 files, you'll end up with 1000 OCI repositories which is probably not acceptable on many OCI registry implementations. I also think using the names of files in the RPM repo as OCI repository names will run into issues. Per the spec at https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pulling-manifests , repository names need to match a certain regular expression. It definitely doesn't match all legal RPM filenames, for example it only allows lowercase letters. |
I would be much more interested in investigating whether a more "native" approach is feasible (or reasonable) for RPMs. That means - no repomd.xml, no primary.xml, etc. Trying to put an RPM repository directly into an OCI registry seems a bit nuts to me. |
It's a question of how deep one wants to go down the rabbit hole. I would agree that it seems quite obvious to kill So I agree with you at a high level but...
The practical problem here is that doing anything else would require API changes in both librepo and its consumers I think. In a world where we store RPMs as OCI natively I am sure there'd be a need for a tool to "bridge" the two formats and synthesize the legacy format from the OCI one. And we'd need to be realistic about having to care about the rpm-md format for many years. Maybe in theory librepo could translate a "rpm-md-oci" layout back into primary.xml client side in some cases? |
I guess I just don't understand what exactly is the benefit of shoving an RPM repository into an OCI artifact registry as-is, as opposed to serving it from HTTP. I see Neal asked the same and I see @atgreen's response, but the whole argument seems to be an economic one (exploiting the fact that some companies provide free OCI registry hosting) rather than a technical one. Is that a good enough reason to add more complexity to the RPM ecosystem or are there other reasons? Red Hat is also working on the whole repo-hosting-as-a-service service, and there's other third parties that have done the same for a while (packagecloud.io, etc.) and even free services like COPR, so I see this as just adding another way of doing something that can already be done moreso than opening up a whole new usecase or even business case.
Ok, but it kinda feels like "API changes in both librepo and its consumers" would be appropriate for this type of change? So the question is, is the marginal utility significant enough to add a new approach that feels slightly half baked in addition to the old way that we would need to support for nigh-eternity, and also whatever new approach gets cooked up in 4 years :)
Sure, definitely agree with that |
See the "benefits" section in my original issue which was already linked above coreos/rpm-ostree#4155 |
+1 to @cgwalters points. I would add:
Some of these advantages are more obvious when we talk about a self-hosted DNF repo on an httpd server rather than on infrastructure like copr or projects like katello. But OCI registries, as outlined by earlier comments, are ubiquitous these days and easy to leverage for a FOSS project |
I do not want to have this feature if the primary goal is to exploit and exhaust public OCI registries. |
@Conan-Kudo I don't know that this is the primary goal, but as an owner of a very large public OCI registry service, I can tell you that I am much more concerned about other artifact types and actors :) |
There's actually 3 levels of this, sorted in increasing levels of effort+reward:
The immense value of the 3rd path is it's much easier to intersect the world of RPMs and OCI container images directly - for example it would "just work" to But it'd also mean changes to RPM to accept something that looks like an OCI directly as input - and actually kind of hard require switching the way signatures are done to be OCI signatures (instead of its current inline GPG stuff). Cost - and benefit. |
This patch introduces the ability to read repo contents from OCI registries, like ghcr.io, using the new 'oci' protocol. User can specify this protocol in their DNF .repo files as shown below:
To set up the server-side repository, create a public package repository in github, and populate it by pushing the repo file contents using the ORAS cli tool.
Currently, only public repositories are supported. To support private package repositories, a bearer token is required. Implementing this would necessitate changes to libdnf to allow for a bearer_token configuration option in .repo files.
= changelog =
msg: Add support for OCI registries
type: enhancement