Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Processing Failed Due to Concurrent Update #111

Open
samuel-esp opened this issue Nov 14, 2024 · 0 comments · May be fixed by #113
Open

Resource Processing Failed Due to Concurrent Update #111

samuel-esp opened this issue Nov 14, 2024 · 0 comments · May be fixed by #113
Assignees
Labels
enhancement New feature or request

Comments

@samuel-esp
Copy link
Collaborator

When the list of resources to be processed is large (e.g., 2000+ resources), the KubeDownscaler algorithm may take up to half an hour to complete its run. This can lead to the following scenario:

  1. All resources are initially retrieved.
  2. As KubeDownscaler processes each resource, a resource that was previously retrieved may be modified by another entity (such as an HPA, Keda, or through manual intervention).
  3. When KubeDownscaler attempts to process this modified resource, it encounters the following error: "the object has been modified; please apply your changes to the latest version and try again".

This could be a non-blocking error for most use cases, because the resource will be processed once again in the next cycle. Altough if some people are using --once argument on a large cluster, it could lead to some resources not being correctly processed

Issue

2024-11-13 19:14:33,671 ERROR: Failed to process Deployment test-namespace/deployment-test: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 437, in raise_for_status
    resp.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://omitted:443/apis/apps/v1/namespaces/test-namespace/deployments/deployment-test

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/kube_downscaler/scaler.py", line 419, in autoscale_resource
    resource.update()
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 165, in update
    self.patch(self.obj, subresource=subresource)
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 157, in patch
    self.api.raise_for_status(r)
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 444, in raise_for_status
    raise HTTPError(resp.status_code, payload["message"])
pykube.exceptions.HTTPError: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again
2024-11-13 19:14:33,671 INFO: Scaling down Deployment test-namespace/bwce-sigsc001 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:35,081 INFO: Scaling down Deployment test-namespace/bwce-siinf028 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:36,438 INFO: Scaling down Deployment test-namespace/bwce-siinf032 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:37,797 INFO: Scaling down Deployment test-namespace/bwce-siinf037 from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:39,073 INFO: Scaling down Deployment test-namespace/deployment-test from 1 to 0 replicas (uptime: never, downtime: always)
2024-11-13 19:14:39,086 ERROR: Failed to process Deployment test-namespace/deployment-test: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 437, in raise_for_status
    resp.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://omitted:443/apis/apps/v1/namespaces/test-namespace/deployments/deployment-test

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/kube_downscaler/scaler.py", line 419, in autoscale_resource
    resource.update()
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 165, in update
    self.patch(self.obj, subresource=subresource)
  File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 157, in patch
    self.api.raise_for_status(r)
  File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 444, in raise_for_status
    raise HTTPError(resp.status_code, payload["message"])
pykube.exceptions.HTTPError: Operation cannot be fulfilled on deployments.apps "deployment-test": the object has been modified; please apply your changes to the latest version and try again

Problem to solve

The scenario where the object is modified when KubeDownscaler is running should be handled gracefully

Proposal

When this behavior is detected, KubeDownscaler should fetch the single resource again and then invoke the autoscale_resource function again. A retry parameter could be introduced to allow the user to set the number of retries to perform before moving on to the next resource.

@samuel-esp samuel-esp added the enhancement New feature or request label Nov 14, 2024
@samuel-esp samuel-esp self-assigned this Nov 14, 2024
@samuel-esp samuel-esp linked a pull request Nov 16, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant