Resource Processing Failed Due to Concurrent Update #111

samuel-esp · 2024-11-14T12:27:12Z

When the list of resources to be processed is large (e.g., 2000+ resources), the KubeDownscaler algorithm may take up to half an hour to complete its run. This can lead to the following scenario:

All resources are initially retrieved.
As KubeDownscaler processes each resource, a resource that was previously retrieved may be modified by another entity (such as an HPA, Keda, or through manual intervention).
When KubeDownscaler attempts to process this modified resource, it encounters the following error: "the object has been modified; please apply your changes to the latest version and try again".

This could be a non-blocking error for most use cases, because the resource will be processed once again in the next cycle. Altough if some people are using --once argument on a large cluster, it could lead to some resources not being correctly processed

Issue

2024-11-13 19:14:33,671 ERROR: Failed to process Deployment test-namespace/deployment-test: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 437, in raise_for_status
    resp.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://omitted:443/apis/apps/v1/namespaces/test-namespace/deployments/deployment-test

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/kube_downscaler/scaler.py", line 419, in autoscale_resource
    resource.update()
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 165, in update
    self.patch(self.obj, subresource=subresource)
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 157, in patch
    self.api.raise_for_status(r)
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 444, in raise_for_status
    raise HTTPError(resp.status_code, payload["message"])
pykube.exceptions.HTTPError: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again
2024-11-13 19:14:33,671 INFO: Scaling down Deployment test-namespace/bwce-sigsc001 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:35,081 INFO: Scaling down Deployment test-namespace/bwce-siinf028 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:36,438 INFO: Scaling down Deployment test-namespace/bwce-siinf032 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:37,797 INFO: Scaling down Deployment test-namespace/bwce-siinf037 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:39,073 INFO: Scaling down Deployment test-namespace/deployment-test from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:39,086 ERROR: Failed to process Deployment test-namespace/deployment-test: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 437, in raise_for_status
    resp.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://omitted:443/apis/apps/v1/namespaces/test-namespace/deployments/deployment-test

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/kube_downscaler/scaler.py", line 419, in autoscale_resource
    resource.update()
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 165, in update
    self.patch(self.obj, subresource=subresource)
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 157, in patch
    self.api.raise_for_status(r)
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 444, in raise_for_status
    raise HTTPError(resp.status_code, payload["message"])
pykube.exceptions.HTTPError: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again

Problem to solve

The scenario where the object is modified when KubeDownscaler is running should be handled gracefully

Proposal

When this behavior is detected, KubeDownscaler should fetch the single resource again and then invoke the autoscale_resource function again. A retry parameter could be introduced to allow the user to set the number of retries to perform before moving on to the next resource.

The text was updated successfully, but these errors were encountered:

samuel-esp added the enhancement New feature or request label Nov 14, 2024

samuel-esp self-assigned this Nov 14, 2024

samuel-esp mentioned this issue Nov 14, 2024

Allow for synchronous operation caas-team/GoKubeDownscaler#68

Open

samuel-esp linked a pull request Nov 16, 2024 that will close this issue

Feat: Retry When Detecting Concurrent Updates #113

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource Processing Failed Due to Concurrent Update #111

Resource Processing Failed Due to Concurrent Update #111

samuel-esp commented Nov 14, 2024

Resource Processing Failed Due to Concurrent Update #111

Resource Processing Failed Due to Concurrent Update #111

Comments

samuel-esp commented Nov 14, 2024

Issue

Problem to solve

Proposal