Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens if a task can't be canceled because it is already canceled or stopped? #161

Open
adamnovak opened this issue Sep 24, 2021 · 1 comment
Milestone

Comments

@adamnovak
Copy link

The task cancellation endpoint is specced to only ever return 200.

responses:
200:
description: ""
content:
application/json:
schema:
$ref: '#/components/schemas/tesCancelTaskResponse'

However, not all received cancellation requests on all implementations actually succeed. Funnel, for example, will return a 500 error code if a task can't actually be canceled because it has already finished or stopped:

<status code> 500
<response> {
  "error": "Won't switch between two terminal states: EXECUTOR_ERROR -\u003e CANCELED",
  "code": 2,
  "message": "Won't switch between two terminal states: EXECUTOR_ERROR -\u003e CANCELED"
}

It's tough for my application to tell the difference between this kind of response from a server and the request just not having been received by a working replica of the server; I want to be able to retry with backoff if my request didn't go through, but if servers return 500 when they just don't like the request, I can't safely do that.

If the server is required to always return 200, even if the cancellation can't actually happen, the description for the return code should say that (and someone should fix Funnel).

If the server is allowed to return an error code in this situation, I would recommend the spec say that the server should return a 409 Conflict response when attempting to cancel a stopped task, as the cancellation is in conflict with the state of the task. Someone would still need to fix Funnel to comply with the updated spec. There's also 401 Gone, 304 Not Modified, 403 Forbidden, and the generic 400 Bad Request available.

There's [a similar situation over in WES for workflow cancellation[(https://github.com/ga4gh/workflow-execution-service-schemas/blob/c3b19854240c4fcbaf3483e22b19db0a918a7ee5/openapi/paths/runs%40%7Brun_id%7D%40cancel.yaml#L16-L46), although there some error codes are noted as being possible (just not 409). Out of those, 403 Forbidden might be the one for trying to cancel the uncancellable, but I believe implementations like Toil's WES server are currently using 500 Internal Server Error there too.

@markjschreiber
Copy link

403 would typically happen in the case of an authorization problem, I don't think that should be used here.

Ideally the cancel call would be idempotent so returning 200 seems fine to me even if the task is already canceled or has previously finished or failed.

If the task never existed then a 404 seems appropriate.

If task a task that is still running cannot be canceled because the TES service cannot get the back end compute to respond then this sounds like something worthy of a 500 response.

@vsmalladi vsmalladi added this to the 1.2 milestone Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants