`training-operator` cannot be upgraded from `1.7/stable` to recent version #170

DnPlas · 2024-06-26T13:33:24Z

Bug Description

Due to #161, the latest training-operator charm will only have one container in the scheduled Pod, which corresponds to the container running the charm, as the workload will be deployed and scheduled separately.
Because of https://bugs.launchpad.net/juju/+bug/1991955, upgrading the charm from 1.7/stable to the most recent version, the Pod will still have two containers instead of one after running juju refresh training-operator --channel latest/edge.

To Reproduce

Deploy juju training-operator --channel 1.7/stable --trust
Refresh juju refresh training-operator --channel latest/edge/pr-162 (this works right now) or juju refresh training-operator --channel latest/edge (this will only work once refactor, chore: refactor charm to use Deployment for workload, also bumps training-operator 1.7->1.8 #167 is merged)
Observe

Environment

microk8s 1.29/stable
juju 3.5.0

Relevant Log Output

# Getting the pods after a refresh shows the  training-operator-0 Pod has 2/2
# which has the charm and the workload container
# Then because of the refactor, we have another Pod training-operator-7f97689fcf-2zshp
# with the workload as well

$ kubectl get pods -A
testing                               training-operator-0                      2/2     Running   0          10m
testing                               training-operator-7f97689fcf-2zshp       1/1     Running   0          9m59s

Additional Context

This issue is currently affecting the test_upgrade integration test case and the upgrade path.

Workarounds

So far, the only workaround that has worked is to remove the operator and re-deploy it.

The text was updated successfully, but these errors were encountered:

syncronize-issues-to-jira · 2024-06-26T13:33:32Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5901.

This message was autogenerated

#170 is affecting the execution of this test, but since the fix is on juju, there is not much we can do at the moment other than skipping the test. Part of #170

* tests: skip test_upgrade due to #170 #170 is affecting the execution of this test, but since the fix is on juju, there is not much we can do at the moment other than skipping the test. Part of #170

…` for workload, also bumps training-operator 1.7->1.8 (#167) * pin integration test dependencies, refactor constants in tests (#164) * refactor: deploy the training-operator with kubernetes resources (#161) * chore: bump training-operator v1.7 -> v1.8 (#162) * refactor: apply a workload Service instead of using juju created one (#173) * tests: skip test_upgrade due to #170 (#171) * build, tests: bump charmed-kubeflow-chisme 0.4.0 -> 0.4.1 (#172) Fixes #159

DnPlas added the bug Something isn't working label Jun 26, 2024

DnPlas added a commit that referenced this issue Jun 26, 2024

tests: skip test_upgrade due to #170

234a73b

#170 is affecting the execution of this test, but since the fix is on juju, there is not much we can do at the moment other than skipping the test. Part of #170

DnPlas mentioned this issue Jun 26, 2024

tests: skip test_upgrade due to #170 #171

Merged

DnPlas mentioned this issue Jul 4, 2024

refactor, chore: refactor charm to use Deployment for workload, also bumps training-operator 1.7->1.8 #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`training-operator` cannot be upgraded from `1.7/stable` to recent version #170

`training-operator` cannot be upgraded from `1.7/stable` to recent version #170

DnPlas commented Jun 26, 2024 •

edited

Loading

syncronize-issues-to-jira bot commented Jun 26, 2024

training-operator cannot be upgraded from 1.7/stable to recent version #170

training-operator cannot be upgraded from 1.7/stable to recent version #170

Comments

DnPlas commented Jun 26, 2024 • edited Loading

Bug Description

To Reproduce

Environment

Relevant Log Output

Additional Context

Workarounds

syncronize-issues-to-jira bot commented Jun 26, 2024

`training-operator` cannot be upgraded from `1.7/stable` to recent version #170

`training-operator` cannot be upgraded from `1.7/stable` to recent version #170

DnPlas commented Jun 26, 2024 •

edited

Loading