Skip to content

Commit

Permalink
scheduler: Shorten tolerations for node failure
Browse files Browse the repository at this point in the history
Similar to what was done in #1055, we need to explicitly add tolerations
to the scheduler to get it to be recreated more quickly on node failure.

This is particularly necessary because we don't have #955. We could wait
for that, but it's a lot of work, and this is a small thing we can do in
the meantime.

Fixes neondatabase/cloud#17298.
  • Loading branch information
sharnoff committed Nov 18, 2024
1 parent 59f6746 commit a94ff4d
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions autoscale-scheduler/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,13 @@ spec:
- name: plugin-config-volume
configMap:
name: scheduler-plugin-config

tolerations:
# Add explicit (short) tolerations for node failure, because otherwise the default of 5m
# will be used, which is unacceptably long for us.
- key: node.kubernetes.io/not-ready
tolerationSeconds: 30
effect: NoExecute
- key: node.kubernetes.io/unreachable
tolerationSeconds: 30
effect: NoExecute

0 comments on commit a94ff4d

Please sign in to comment.