-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs for Nexus circuit breaker #3220
base: main
Are you sure you want to change the base?
Conversation
### Circuit Breaker {#circuit-breaker} | ||
|
||
The circuit breaker kicks in when requests fail with a [retryable error](https://github.com/temporalio/temporal/blob/13d6cd8cf7a4ba0c4660cf98f672bbd645dca3e7/components/nexusoperations/executors.go#L659) | ||
consecutively as it might indicate that the destination (eg: Nexus service to start operation, or | ||
the caller for callback request) is down or unable to process the request. The default behavior of | ||
the circuit breaker is to open after 5 consecutive failed requests. Once in open state, Nexus taskk | ||
will fail early and requests won't be sent to destination. After a minute in open state, it will | ||
change to half-open state, which will allow only 1 request to be made. If the request is successful, | ||
then the circuit breaker changes its state to closed, and allows all requests to pass through. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only change I made, all other changes was yarn build
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would defer to someone from documentation to go over the grammar and add a section on how users would find out when a destination is down. Specifically mention the BLOCKED state and maybe who what it'd look like when describing a workflow via the CLI.
@@ -146,6 +146,16 @@ As mentioned above, a synchronous Nexus Operation handler has less than 10 secon | |||
Once the caller Workflow schedules an Operation with the caller’s Temporal cluster, the caller’s Nexus Machinery keeps trying to start the Operation, with automatic retries and exponential backoff. | |||
If a Nexus Operation returns a [retryable error](https://github.com/temporalio/temporal/blob/13d6cd8cf7a4ba0c4660cf98f672bbd645dca3e7/components/nexusoperations/executors.go#L659) when attempting to start, the Operation it will be retried up to the [default Retry Policy’s](https://github.com/temporalio/temporal/blob/de7c8879e103be666a7b067cc1b247f0ac63c25c/components/nexusoperations/config.go#L111) max attempts and expiration interval. | |||
|
|||
### Circuit Breaker {#circuit-breaker} | |||
|
|||
The circuit breaker kicks in when requests fail with a [retryable error](https://github.com/temporalio/temporal/blob/13d6cd8cf7a4ba0c4660cf98f672bbd645dca3e7/components/nexusoperations/executors.go#L659) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would explicitly call out request timeouts and how that may relate to when a worker is down.
### Circuit Breaker {#circuit-breaker} | ||
|
||
The circuit breaker kicks in when requests fail with a [retryable error](https://github.com/temporalio/temporal/blob/13d6cd8cf7a4ba0c4660cf98f672bbd645dca3e7/components/nexusoperations/executors.go#L659) | ||
consecutively as it might indicate that the destination (eg: Nexus service to start operation, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoint would be down, not the service.
The circuit breaker kicks in when requests fail with a [retryable error](https://github.com/temporalio/temporal/blob/13d6cd8cf7a4ba0c4660cf98f672bbd645dca3e7/components/nexusoperations/executors.go#L659) | ||
consecutively as it might indicate that the destination (eg: Nexus service to start operation, or | ||
the caller for callback request) is down or unable to process the request. The default behavior of | ||
the circuit breaker is to open after 5 consecutive failed requests. Once in open state, Nexus taskk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure users will understand what nexus tasks are in this context and this applies to callback "tasks" as well. This is all internal details, you should use "the Nexus machinery" instead.
consecutively as it might indicate that the destination (eg: Nexus service to start operation, or | ||
the caller for callback request) is down or unable to process the request. The default behavior of | ||
the circuit breaker is to open after 5 consecutive failed requests. Once in open state, Nexus taskk | ||
will fail early and requests won't be sent to destination. After a minute in open state, it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fail early and requests won't be sent to destination. After a minute in open state, it will | |
will fail early and requests won't be sent to that destination. After a minute in open state, it will |
What does this PR do?
Add docs for Nexus circuit breaker
Notes to reviewers