Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[do not merge] Ensure client and scheduler are resilient to server autoscaling #2277

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Commits on Oct 25, 2024

  1. Configuration menu
    Copy the full SHA
    3c4547b View commit details
    Browse the repository at this point in the history
  2. ensure client tracks certs by server_id, and both the scheduler and c…

    …lient remove certs when a server's cert changes
    trxcllnt committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    df2e4a1 View commit details
    Browse the repository at this point in the history
  3. rewrite scheduler handle_alloc_job to be resilient to server errors a…

    …nd try other candidates instead of failing
    trxcllnt committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    fe83892 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4657454 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6cd9ff3 View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2024

  1. track started job mtimes and assume jobs that take longer than `SCCAC…

    …HE_DIST_REQUEST_TIMEOUT` seconds are stale and should be removed
    trxcllnt committed Oct 28, 2024
    Configuration menu
    Copy the full SHA
    5f1d50e View commit details
    Browse the repository at this point in the history