Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pull-through caching #1299

Merged
merged 1 commit into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES/507.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Added support for pull-through caching. Users can now configure a dedicated distribution and remote
linked to an external registry without the need to create and mirror repositories in advance. Pulp
downloads missing content automatically if requested and acts as a caching proxy.
3 changes: 2 additions & 1 deletion docs/tech-preview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ Tech previews

The following features are currently being released as part of a tech preview:

* Build an OCI image from a Containerfile
* Building an OCI image from a Containerfile.
* Support for hosting Flatpak content in OCI format.
* Pull-through caching (i.e., proxy cache) for upstream registries.
42 changes: 42 additions & 0 deletions docs/workflows/host.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,45 @@ Docker Output::
In general, the automatic conversion cannot be performed when the content is not available
in the storage. Therefore, it may be successful only if the content was previously synced
with the ``immediate`` policy.


Pull-Through Caching
--------------------

.. warning::
This feature is provided as a tech preview and could change in backwards incompatible
ways in the future.

The Pull-Through Caching feature offers an alternative way to host content by leveraging a **remote
registry** as the source of truth. This eliminates the need for in-advance repository
synchronization because Pulp acts as a **caching proxy** and stores images, after they have been
pulled by an end client, in a local repository.

Configuring the caching::

# initialize a pull-through remote (the concept of upstream-name is not applicable here)
REMOTE_HREF=$(http ${BASE_ADDR}/pulp/api/v3/remotes/container/pull-through/ name=docker-cache url=https://registry-1.docker.io | jq -r ".pulp_href")

# create a pull-through distribution linked to the initialized remote
http ${BASE_ADDR}/pulp/api/v3/distributions/container/pull-through/ remote=${REMOTE_HREF} name=docker-cache base_path=docker-cache

Pulling content::

podman pull localhost:24817/docker-cache/library/busybox

In the example above, the image "busybox" is pulled from *DockerHub* through the "docker-cache"
distribution, acting as a transparent caching layer.

By incorporating the Pull-Through Caching feature into standard workflows, users **do not need** to
pre-configure a new repository and sync it to facilitate the retrieval of the actual content. This
speeds up the whole process of shipping containers from its early management stages to distribution.
Similarly to on-demand syncing, the feature also **reduces external network dependencies**, and
ensures a more reliable container deployment system in production environments.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions are public by default.


.. note::
During the pull-through operation, Pulp creates a local repository that maintains a single
version for pulled images. For instance, when pulling an image like "debian:10," a local
repository named "debian" with the tag "10" is created. Subsequent pulls, such as "debian:11,"
generate a new repository version that incorporates both the "10" and "11" tags, automatically
removing the previous version. Repositories and their content remain manageable through standard
Pulp API endpoints. The repositories are read-only and public by default.
17 changes: 12 additions & 5 deletions pulp_container/app/cache.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
from django.core.exceptions import ObjectDoesNotExist
from django.db.models import F, Value

from pulpcore.plugin.cache import CacheKeys, AsyncContentCache, SyncContentCache

from pulp_container.app.models import ContainerDistribution
from pulp_container.app.models import ContainerDistribution, ContainerPullThroughDistribution
from pulp_container.app.exceptions import RepositoryNotFound

ACCEPT_HEADER_KEY = "accept_header"
Expand Down Expand Up @@ -69,11 +70,17 @@ def find_base_path_cached(request, cached):
return path
else:
try:
distro = ContainerDistribution.objects.select_related(
"repository", "repository_version"
).get(base_path=path)
distro = ContainerDistribution.objects.get(base_path=path)
except ObjectDoesNotExist:
raise RepositoryNotFound(name=path)
distro = (
ContainerPullThroughDistribution.objects.annotate(path=Value(path))
.filter(path__startswith=F("base_path"))
.order_by("-base_path")
.first()
)
if not distro:
raise RepositoryNotFound(name=path)

return distro.base_path


Expand Down
10 changes: 6 additions & 4 deletions pulp_container/app/content.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@
registry = Registry()

app.add_routes(
[web.get(r"/pulp/container/{path:.+}/blobs/sha256:{digest:.+}", registry.get_by_digest)]
)
app.add_routes(
[web.get(r"/pulp/container/{path:.+}/manifests/sha256:{digest:.+}", registry.get_by_digest)]
[
web.get(
r"/pulp/container/{path:.+}/{content:(blobs|manifests)}/sha256:{digest:.+}",
registry.get_by_digest,
)
]
)
app.add_routes([web.get(r"/pulp/container/{path:.+}/manifests/{tag_name}", registry.get_tag)])
23 changes: 22 additions & 1 deletion pulp_container/app/downloaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re

from aiohttp.client_exceptions import ClientResponseError
from collections import namedtuple
from logging import getLogger
from multidict import MultiDict
from urllib import parse
Expand All @@ -15,6 +16,11 @@

log = getLogger(__name__)

HeadResult = namedtuple(
"HeadResult",
["status_code", "path", "artifact_attributes", "url", "headers"],
)


class RegistryAuthHttpDownloader(HttpDownloader):
"""
Expand All @@ -31,6 +37,7 @@ def __init__(self, *args, **kwargs):
Initialize the downloader.
"""
self.remote = kwargs.pop("remote")

super().__init__(*args, **kwargs)

async def _run(self, handle_401=True, extra_data=None):
Expand Down Expand Up @@ -95,7 +102,12 @@ async def _run(self, handle_401=True, extra_data=None):
return await self._run(handle_401=False, extra_data=extra_data)
else:
raise
to_return = await self._handle_response(response)

if http_method == "head":
to_return = await self._handle_head_response(response)
else:
to_return = await self._handle_response(response)

await response.release()
self.response_headers = response.headers

Expand Down Expand Up @@ -173,6 +185,15 @@ def auth_header(token, basic_auth):
return {"Authorization": basic_auth}
return {}

async def _handle_head_response(self, response):
return HeadResult(
status_code=response.status,
path=None,
artifact_attributes=None,
url=self.url,
headers=response.headers,
)


class NoAuthSignatureDownloader(HttpDownloader):
"""A downloader class suited for signature downloads."""
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Generated by Django 4.2.8 on 2023-12-12 21:15

from django.db import migrations, models
import django.db.models.deletion
import pulpcore.app.models.access_policy


class Migration(migrations.Migration):

dependencies = [
('core', '0116_alter_remoteartifact_md5_alter_remoteartifact_sha1_and_more'),
('container', '0036_containerpushrepository_pending_blobs_manifests'),
]

operations = [
migrations.CreateModel(
name='ContainerPullThroughRemote',
fields=[
('remote_ptr', models.OneToOneField(auto_created=True, on_delete=django.db.models.deletion.CASCADE, parent_link=True, primary_key=True, serialize=False, to='core.remote')),
],
options={
'permissions': [('manage_roles_containerpullthroughremote', 'Can manage role assignments on pull-through container remote')],
'default_related_name': '%(app_label)s_%(model_name)s',
},
bases=('core.remote', pulpcore.app.models.access_policy.AutoAddObjPermsMixin),
),
migrations.AddField(
model_name='containerrepository',
name='pending_blobs',
field=models.ManyToManyField(to='container.blob'),
),
migrations.AddField(
model_name='containerrepository',
name='pending_manifests',
field=models.ManyToManyField(to='container.manifest'),
),
migrations.CreateModel(
name='ContainerPullThroughDistribution',
fields=[
('distribution_ptr', models.OneToOneField(auto_created=True, on_delete=django.db.models.deletion.CASCADE, parent_link=True, primary_key=True, serialize=False, to='core.distribution')),
('private', models.BooleanField(default=False, help_text='Restrict pull access to explicitly authorized users. Related distributions inherit this value. Defaults to unrestricted pull access.')),
('description', models.TextField(null=True)),
('namespace', models.ForeignKey(null=True, on_delete=django.db.models.deletion.CASCADE, related_name='container_pull_through_distributions', to='container.containernamespace')),
],
options={
'permissions': [('manage_roles_containerpullthroughdistribution', 'Can manage role assignments on pull-through cache distribution')],
'default_related_name': '%(app_label)s_%(model_name)s',
},
bases=('core.distribution', pulpcore.app.models.access_policy.AutoAddObjPermsMixin),
),
migrations.AddField(
model_name='containerdistribution',
name='pull_through_distribution',
field=models.ForeignKey(null=True, on_delete=django.db.models.deletion.CASCADE, related_name='distributions', to='container.containerpullthroughdistribution'),
),
]
70 changes: 70 additions & 0 deletions pulp_container/app/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,25 @@ class Meta:
]


class ContainerPullThroughRemote(Remote, AutoAddObjPermsMixin):
"""
A remote for pull-through caching, omitting the requirement for the upstream name.

This remote is used for instantiating new regular container remotes with the upstream name.
ipanova marked this conversation as resolved.
Show resolved Hide resolved
Configuring credentials and everything related to container workflows can be therefore done
from within a single instance of this remote.
"""

class Meta:
default_related_name = "%(app_label)s_%(model_name)s"
permissions = [
(
"manage_roles_containerpullthroughremote",
"Can manage role assignments on pull-through container remote",
),
]


class ManifestSigningService(SigningService):
"""
Signing service used for creating container signatures.
Expand Down Expand Up @@ -486,6 +505,8 @@ class ContainerRepository(
manifest_signing_service = models.ForeignKey(
ManifestSigningService, on_delete=models.SET_NULL, null=True
)
pending_blobs = models.ManyToManyField(Blob)
pending_manifests = models.ManyToManyField(Manifest)

class Meta:
default_related_name = "%(app_label)s_%(model_name)s"
Expand All @@ -509,6 +530,15 @@ def finalize_new_version(self, new_version):
"""
remove_duplicates(new_version)
validate_repo_version(new_version)
self.remove_pending_content(new_version)

def remove_pending_content(self, repository_version):
"""Remove pending blobs and manifests when committing the content to the repository."""
added_content = repository_version.added(
base_version=repository_version.base_version
).values_list("pk")
self.pending_blobs.remove(*Blob.objects.filter(pk__in=added_content))
self.pending_manifests.remove(*Manifest.objects.filter(pk__in=added_content))


class ContainerPushRepository(Repository, AutoAddObjPermsMixin):
Expand Down Expand Up @@ -565,6 +595,39 @@ def remove_pending_content(self, repository_version):
self.pending_manifests.remove(*Manifest.objects.filter(pk__in=added_content))


class ContainerPullThroughDistribution(Distribution, AutoAddObjPermsMixin):
"""
A distribution for pull-through caching, referencing normal distributions.
"""

TYPE = "pull-through"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about private flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I did not consider this.


namespace = models.ForeignKey(
ContainerNamespace,
on_delete=models.CASCADE,
related_name="container_pull_through_distributions",
null=True,
)
private = models.BooleanField(
default=False,
help_text=_(
"Restrict pull access to explicitly authorized users. "
"Related distributions inherit this value. "
"Defaults to unrestricted pull access."
),
)
description = models.TextField(null=True)

class Meta:
default_related_name = "%(app_label)s_%(model_name)s"
permissions = [
(
"manage_roles_containerpullthroughdistribution",
"Can manage role assignments on pull-through cache distribution",
),
]


class ContainerDistribution(Distribution, AutoAddObjPermsMixin):
"""
A container distribution defines how a repository version is distributed by Pulp's webserver.
Expand Down Expand Up @@ -595,6 +658,13 @@ class ContainerDistribution(Distribution, AutoAddObjPermsMixin):
)
description = models.TextField(null=True)

pull_through_distribution = models.ForeignKey(
ContainerPullThroughDistribution,
related_name="distributions",
on_delete=models.CASCADE,
null=True,
)
Comment on lines +661 to +666
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be interesting to use this link in permissions...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you imagine having permissions assigned to these objects? Users who are in charge of pull-through cache distributions should be able to preview the created subdistributions. Are you suggesting adjusting default roles and viewsets in a manner that will disallow the users from seeing specific distributions (possible) or getting to pull-through cache distributions from reverse relation (not possible from API)? I would leave the permissions untouched as we have between Tasks and TaskGroups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the owner of the pull-through distribution should also own all the auto-created stuff related to it.

Copy link
Member Author

@lubosmj lubosmj Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to focus on delivering an MVP first to see if we get any feedback for this feature.


def get_repository_version(self):
"""
Returns the repository version that is supposed to be served by this ContainerDistribution.
Expand Down
Loading
Loading