Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KFP Profile Controller K8s Service port not specified #317

Closed
phoevos opened this issue Sep 6, 2023 · 0 comments · Fixed by #318
Closed

KFP Profile Controller K8s Service port not specified #317

phoevos opened this issue Sep 6, 2023 · 0 comments · Fixed by #318
Assignees
Labels
23.10 Should be fixed by 23.10 bug Something isn't working Kubeflow 1.8 This issue affects the Charmed Kubeflow 1.8 release

Comments

@phoevos
Copy link
Contributor

phoevos commented Sep 6, 2023

Bug Description

In the PodSpec implementation of the KFP profile controller charm we used to configure the port for the service pointing to the controller via Juju:

"ports": [
{
"name": "http",
"containerPort": CONTROLLER_PORT,
"protocol": "TCP",
},
],

This was not included, however, in the charm rewrite, which leads to the kfp-profile-controller K8s service pointing to the default placeholder ports (see example below). As a result, it is impossible to hit the webhook defined as part of the CompositeController:

"sync_webhook_url": f"http://{self.model.app.name}.{self.model.name}/sync",

It is important to note that the webhook controller listens to a specific port which is defined as part of the corresponding Pebble service:

To Reproduce

Deploying the metacontroller and KFP profile controller charms from the latest versions won't suffice here. We also need to deploy MinIO and relate it to the latter:

juju deploy metacontroller-operator --channel latest/edge --config metacontroller-image=docker.io/metacontrollerio/metacontroller:v2.0.4 --trust
juju deploy kfp-profile-controller --channel latest/edge --trust
juju deploy minio --channel latest/edge --trust
juju relate minio:object-storage kfp-profile-controller:object-storage

After all charms have deployed successfully, we can look at the relevant service to find that the port is indeed not set correctly:

$ kubectl -n kubeflow get svc kfp-profile-controller -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    controller.juju.is/id: ea1d4837-f25d-4c33-894c-8ec5565720e9
    juju.is/version: 2.9.44
    model.juju.is/id: 05ecfd09-18d9-48f8-89ce-f751b3e883f3
  creationTimestamp: "2023-09-06T17:45:04Z"
  labels:
    app.kubernetes.io/managed-by: juju
    app.kubernetes.io/name: kfp-profile-controller
  name: kfp-profile-controller
  namespace: kubeflow
  resourceVersion: "1175097"
  uid: 7083c4d0-1cb1-47b0-a742-d4f1de942e50
spec:
  clusterIP: 10.152.183.79
  clusterIPs:
  - 10.152.183.79
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: placeholder
    port: 65535
    protocol: TCP
    targetPort: 65535
  selector:
    app.kubernetes.io/name: kfp-profile-controller
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

We can also fetch the metacontroller logs to find symptoms of the described bug:

kubectl -n kubeflow logs metacontroller-operator-charm-0

The issue will most likely present with errors mentioning context deadline exceeded or Client.Timeout exceeded while awaiting headers (see logs below for examples).

Environment

This bug affects the latest/edge version of the charm (rev > 676).

Relevant log output

{"level":"error","ts":1694008885.0872207,"msg":"failed to sync Namespace 'knative-serving': sync hook failed for Namespace /knative-serving: sync hook failed: http error: Post \"http://kfp-profile-controller.kubeflow/sync\": context deadline exceeded\n","stacktrace":"metacontroller/pkg/controller/composite.(*parentController).worker\n\t/go/src/metacontroller/pkg/controller/composite/controller.go:287\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90\nmetacontroller/pkg/controller/composite.(*parentController).Start.func1.1\n\t/go/src/metacontroller/pkg/controller/composite/controller.go:263"}
{"level":"error","ts":1694008885.088049,"msg":"failed to sync Namespace 'default': sync hook failed for Namespace /default: sync hook failed: http error: Post \"http://kfp-profile-controller.kubeflow/sync\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\n","stacktrace":"metacontroller/pkg/controller/composite.(*parentController).worker\n\t/go/src/metacontroller/pkg/controller/composite/controller.go:287\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90\nmetacontroller/pkg/controller/composite.(*parentController).Start.func1.1\n\t/go/src/metacontroller/pkg/controller/composite/controller.go:263"}

Additional context

N/A

@phoevos phoevos self-assigned this Sep 6, 2023
@phoevos phoevos added bug Something isn't working 23.10 Should be fixed by 23.10 Kubeflow 1.8 This issue affects the Charmed Kubeflow 1.8 release labels Sep 6, 2023
phoevos added a commit that referenced this issue Sep 8, 2023
* Patch the port specified in the K8s service created by Juju to point to
  the one the controller pod responsible for running the `sync.py` script
  listens to.
* Add an integration test to verify that the sync webhook has applied all
  desired resources. In order for the test to work (and the metacontroller
  to be deployed successfully) we also deploy the admission-webhook which
  is responsible for applying the PodDefault CRD.

Closes #317
Refs #314

Signed-off-by: Phoevos Kalemkeris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
23.10 Should be fixed by 23.10 bug Something isn't working Kubeflow 1.8 This issue affects the Charmed Kubeflow 1.8 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant