-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot deploy bundle with kfp-api 1.7/stable rev 866 #735
Comments
Hey @Barteus just to understand this issue better, after deploying CKF using the 1.7/stable bundle definition, kfp-api goes to an error state? Is it the only charm that is failing? To help us debug better, could you please share the output of 'juju status kfp-api' as well as the logs from the apiserver container (kubectl logs -nkubeflow kfp-api-0 -c apiserver)? Can you also check if the kfp-db is active? |
Hey @Barteus I deployed Kubeflow 1.7/stable yerterday Only diff I have from your env is that I deployed on Microk8s Juju Status juju status
Model Controller Cloud/Region Version SLA Timestamp
kubeflow microk8s-localhost microk8s/localhost 2.9.45 unsupported 14:33:48Z
App Version Status Scale Charm Channel Rev Address Exposed Message
admission-webhook res:oci-image@2d74d1b active 1 admission-webhook 1.7/stable 224 10.152.183.8 no
argo-controller res:oci-image@3902c16 active 1 argo-controller 3.3/stable 376 no
argo-server res:oci-image@e2292c9 active 1 argo-server 3.3/stable 309 no
dex-auth active 1 dex-auth 2.31/stable 346 10.152.183.6 no
istio-ingressgateway active 1 istio-gateway 1.16/stable 663 10.152.183.108 no
istio-pilot active 1 istio-pilot 1.16/stable 662 10.152.183.161 no
jupyter-controller res:oci-image@1167186 active 1 jupyter-controller 1.7/stable 805 no
jupyter-ui active 1 jupyter-ui 1.7/stable 727 10.152.183.187 no
katib-controller res:oci-image@111495a active 1 katib-controller 0.15/stable 282 10.152.183.188 no
katib-db 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 99 10.152.183.237 no
katib-db-manager active 1 katib-db-manager 0.15/stable 253 10.152.183.147 no
katib-ui active 1 katib-ui 0.15/stable 267 10.152.183.243 no
kfp-api active 1 kfp-api 2.0/stable 866 10.152.183.122 no
kfp-db 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 99 10.152.183.7 no
kfp-persistence res:oci-image@ebed770 active 1 kfp-persistence 2.0/stable 870 no
kfp-profile-controller res:oci-image@aa75b0c active 1 kfp-profile-controller 2.0/stable 831 10.152.183.143 no
kfp-schedwf res:oci-image@2cb9087 active 1 kfp-schedwf 2.0/stable 932 no
kfp-ui res:oci-image@ae72602 active 1 kfp-ui 2.0/stable 865 10.152.183.56 no
kfp-viewer res:oci-image@899e25f active 1 kfp-viewer 2.0/stable 895 no
kfp-viz res:oci-image@ffaf37e active 1 kfp-viz 2.0/stable 822 10.152.183.229 no
knative-eventing active 1 knative-eventing 1.8/stable 345 10.152.183.139 no
knative-operator active 1 knative-operator 1.8/stable 320 10.152.183.31 no
knative-serving active 1 knative-serving 1.8/stable 346 10.152.183.36 no
kserve-controller active 1 kserve-controller 0.10/stable 394 10.152.183.184 no
kubeflow-dashboard active 1 kubeflow-dashboard 1.7/stable 439 10.152.183.159 no
kubeflow-profiles active 1 kubeflow-profiles 1.7/stable 336 10.152.183.216 no
kubeflow-roles active 1 kubeflow-roles 1.7/stable 148 10.152.183.49 no
kubeflow-volumes res:oci-image@d261609 active 1 kubeflow-volumes 1.7/stable 204 10.152.183.252 no
metacontroller-operator active 1 metacontroller-operator 2.0/stable 204 10.152.183.220 no
minio res:oci-image@1755999 active 1 minio ckf-1.7/stable 214 10.152.183.197 no
mlflow-minio res:oci-image@1755999 active 1 minio ckf-1.7/stable 214 10.152.183.230 no
mlflow-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 99 10.152.183.144 no
mlflow-server active 1 mlflow-server 2.1/stable 466 10.152.183.200 no
oidc-gatekeeper res:oci-image@6b720b8 active 1 oidc-gatekeeper ckf-1.7/stable 269 10.152.183.198 no
seldon-controller-manager active 1 seldon-core 1.15/stable 548 10.152.183.135 no
tensorboard-controller res:oci-image@c52f7c2 active 1 tensorboard-controller 1.7/stable 156 10.152.183.52 no
tensorboards-web-app res:oci-image@929f55b active 1 tensorboards-web-app 1.7/stable 158 10.152.183.162 no
training-operator active 1 training-operator 1.6/stable 305 10.152.183.221 no
Unit Workload Agent Address Ports Message
admission-webhook/0* active idle 10.1.134.142 4443/TCP
argo-controller/0* active idle 10.1.134.204
argo-server/0* active idle 10.1.134.144 2746/TCP
dex-auth/0* active idle 10.1.134.141
istio-ingressgateway/0* active idle 10.1.134.143
istio-pilot/0* active idle 10.1.134.146
jupyter-controller/0* active idle 10.1.134.148
jupyter-ui/0* active idle 10.1.134.150
katib-controller/0* active idle 10.1.134.153 443/TCP,8080/TCP
katib-db-manager/0* active idle 10.1.134.155
katib-db/0* active idle 10.1.134.154 Primary
katib-ui/0* active idle 10.1.134.156
kfp-api/0* active idle 10.1.134.157
kfp-db/0* active idle 10.1.134.159 Primary
kfp-persistence/0* active idle 10.1.134.205
kfp-profile-controller/0* active idle 10.1.134.203 80/TCP
kfp-schedwf/0* active idle 10.1.134.192
kfp-ui/0* active idle 10.1.134.206 3000/TCP
kfp-viewer/0* active idle 10.1.134.134
kfp-viz/0* active idle 10.1.134.158 8888/TCP
knative-eventing/0* active idle 10.1.134.160
knative-operator/0* active idle 10.1.134.165
knative-serving/0* active idle 10.1.134.161
kserve-controller/0* active idle 10.1.134.166
kubeflow-dashboard/0* active idle 10.1.134.164
kubeflow-profiles/0* active idle 10.1.134.168
kubeflow-roles/0* active idle 10.1.134.162
kubeflow-volumes/0* active idle 10.1.134.199 5000/TCP
metacontroller-operator/0* active idle 10.1.134.163
minio/0* active idle 10.1.134.202 9000/TCP,9001/TCP
mlflow-minio/0* active idle 10.1.134.213 9000/TCP,9001/TCP
mlflow-mysql/0* active idle 10.1.134.210 Primary
mlflow-server/0* active idle 10.1.134.211
oidc-gatekeeper/0* active idle 10.1.134.207 8080/TCP
seldon-controller-manager/0* active idle 10.1.134.167
tensorboard-controller/0* active idle 10.1.134.201 9443/TCP
tensorboards-web-app/0* active idle 10.1.134.198 5000/TCP
training-operator/0* active idle 10.1.134.169 Logs # ubuntu@ip-172-31-65-245:~$ microk8s.kubectl logs kfp-api-0 -n kubeflow | grep -i error | grep health
# Empty
# $ microk8s.kubectl logs kfp-api-0 -n kubeflow | grep -i error
Defaulted container "charm" out of: charm, apiserver, charm-init (init)
2023-10-26T20:25:01.736Z [container-agent] 2023-10-26 20:25:01 ERROR juju-log Failed to handle <LeaderElectedEvent via KfpApiOperator/on/leader_elected[31]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:25:03.021Z [container-agent] 2023-10-26 20:25:03 ERROR juju-log Failed to handle <ConfigChangedEvent via KfpApiOperator/on/config_changed[36]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:25:10.708Z [container-agent] 2023-10-26 20:25:10 ERROR juju-log Failed to handle <PebbleReadyEvent via KfpApiOperator/on/apiserver_pebble_ready[46]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:25:11.935Z [container-agent] 2023-10-26 20:25:11 ERROR juju-log relational-db:20: Failed to handle <RelationJoinedEvent via KfpApiOperator/on/relational_db_relation_joined[51]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:25:13.175Z [container-agent] 2023-10-26 20:25:13 ERROR juju-log relational-db:20: Failed to handle <RelationChangedEvent via KfpApiOperator/on/relational_db_relation_changed[56]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:26:01.256Z [container-agent] 2023-10-26 20:26:01 ERROR juju-log relational-db:20: Failed to handle <RelationChangedEvent via KfpApiOperator/on/relational_db_relation_changed[61]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:26:01.867Z [container-agent] 2023-10-26 20:26:01 ERROR juju-log relational-db:20: Failed to handle <DatabaseCreatedEvent via KfpApiOperator/DatabaseRequires[relational-db]/on/database_created[62]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:27:29.166Z [container-agent] 2023-10-26 20:27:29 ERROR juju-log kfp-viz:23: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_viz_relation_changed[72]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:27:30.622Z [container-agent] 2023-10-26 20:27:30 ERROR juju-log kfp-viz:23: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_viz_relation_changed[77]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:27:37.687Z [container-agent] 2023-10-26 20:27:37 ERROR juju-log kfp-viz:23: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_viz_relation_changed[82]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:27:42.149Z [container-agent] 2023-10-26 20:27:42 ERROR juju-log kfp-api:21: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_api_relation_changed[92]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:27:44.614Z [container-agent] 2023-10-26 20:27:44 ERROR juju-log kfp-api:21: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_api_relation_changed[97]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:28:13.456Z [container-agent] 2023-10-26 20:28:13 ERROR juju-log kfp-api:22: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_api_relation_changed[107]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:28:56.110Z [container-agent] 2023-10-26 20:28:56 ERROR juju-log object-storage:24: Failed to handle <RelationChangedEvent via KfpApiOperator/on/object_storage_relation_changed[117]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:29:08.317Z [container-agent] 2023-10-26 20:29:08 ERROR juju-log Failed to handle <UpdateStatusEvent via KfpApiOperator/on/update_status[122]> with error: List of <ops.model.Relation object-storage:24> versions not found for apps: minio
2023-10-26T20:29:45.562Z [container-agent] 2023-10-26 20:29:45 ERROR juju-log object-storage:24: Failed to generate container configuration.
2023-10-26T20:29:45.668Z [container-agent] 2023-10-26 20:29:45 ERROR juju-log object-storage:24: Failed to handle <RelationChangedEvent via KfpApiOperator/on/object_storage_relation_changed[127]> with error: Waiting for kfp-viz relation data
2023-10-26T20:29:49.275Z [container-agent] 2023-10-26 20:29:49 ERROR juju-log kfp-api:22: Failed to generate container configuration.
2023-10-26T20:29:49.312Z [container-agent] 2023-10-26 20:29:49 ERROR juju-log kfp-api:22: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_api_relation_changed[132]> with error: Waiting for kfp-viz relation data |
Hi @DnPlas, I'm working on the same deployment as @Barteus
Container logs:
|
Thanks for the logs @natalytvinova, I will need also the status of the other kfp-* charms and minio, if you can provide them. Specially the state of the I also see Could you please confirm that there is a charm called |
@DnPlas with the new 1.7 bundle and network issues for k8s fixed on our side, we no longer face this bug |
The issue was the availability of a DNS server between the nodes. Thank you for help! Closing the issue. |
Bug Description
When deploying the bundle from "bundle-kubeflow" repository, Juju deploy fails on kfp-persistence trying to check the kfp-api heathcheck.
To Reproduce
Environment
Relevant Log Output
Additional Context
No response
The text was updated successfully, but these errors were encountered: