pageserver: more elegant cancellation for remote operations #6096

jcsp · 2023-12-11T15:01:47Z

Timeouts should be applied after accquiring the semaphore that limits concurrent remote operations. Applying a timeout elsewhere results in loss of fairness if operations are timing out waiting for the semaphore and thereby losing their place in the queue.
Remote operations should return an error type that distinguishes actual errors from shutdown, and callers should thereby avoid logging (+metric-counting) shutdown as if it were an error.
download_retry should understand cancellation errors from the inner function, and then we can avoid passing another cancellation token explicitly to download_retry: it can infer shutdown just from the result of the wrapped function.

jcsp · 2023-12-11T15:22:25Z

Related PR for adding cancellation tokens to the remote storage interface: #4781

## Problem Various places in remote storage were not subject to a timeout (thereby stuck TCP connections could hold things up), and did not respect a cancellation token (so things like timeline deletion or tenant detach would have to wait arbitrarily long). ## Summary of changes - Add download_cancellable and upload_cancellable helpers, and use them in all the places we wait for remote storage operations (with the exception of initdb downloads, where it would not have been safe). - Add a cancellation token arg to `download_retry`. - Use cancellation token args in various places that were missing one per #5066 Closes: #5066 Why is this only "basic" handling? - Doesn't express difference between shutdown and errors in return types, to avoid refactoring all the places that use an anyhow::Error (these should all eventually return a more structured error type) - Implements timeouts on top of remote storage, rather than within it: this means that operations hitting their timeout will lose their semaphore permit and thereby go to the back of the queue for their retry. - Doing a nicer job is tracked in #6096

koivunej · 2024-02-05T07:20:42Z

#6618 takes care of extra CancellationToken cloning. I looked into implementing this and it would turn:

let remote_storage_operation = GenericRemoteStorage::something(million, args, cancel).await;

into

// this is callled from within backoff::retry

let new_nested_token = cancel.child_token();
let remote_storage_operation = GenericRemoteStorage::something(million, args, &new_nested_token);
let remote_storage_operation = std::pin::pin!(remote_storage_operation);

tokio::select! {
    res = &mut remote_storage_operation => res,
    _ = tokio::time::sleep(UPLOAD_TIMEOUT) => {
        new_nested_token.cancel();
        remote_storage_operation.await
  },
}

Leaving us at +1 cancellation token creation after #6618 removes 1-2 clones, becoming a zero sum. Looked that there is no real way to solve this at LocalFS level by spawning so that we could get the gracefulness and have only LocalFS pay for it. This would result in very much the same problem it could currently have.

The best alternative is to just add warning logging to LocalFS that if you get a test failure because of a cancellation (and too fast retry), it would pop out right away as something to consider instead of a very hard to reproduce problem.

Or do nothing and hope no one ever debugs test failure like this. Seems the dropguard logging would be easy to add.

jcsp · 2024-02-05T10:56:24Z

The best alternative is to just add warning logging to LocalFS that if you get a test failure because of a cancellation (and too fast retry), it would pop out right away as something to consider instead of a very hard to reproduce problem.

Sounds good, I agree with not letting LocalFS guide our design. It just has to work, not be graceful.

Leaving us at +1 cancellation token creation after #6618 removes 1-2 clones, becoming a zero sum.

Yep. I like getting rid of the spurious clone in #6618. If the net result is ~similar overhead then we're good, it's fine to pay some clone-level efficiency tax for cancellation, when the operation is a big remote HTTP request.

jcsp · 2024-02-05T11:26:27Z

This week:

John & Joonas to sync on the scope

The solution we ended up for `backoff::retry` requires always cloning of cancellation tokens even though there is just `.await`. Fix that, and also turn the return type into `Option<Result<T, E>>` avoiding the need for the `E::cancelled()` fn passed in. Cc: #6096

Fix cloning the serialized heatmap on every attempt by just turning it into `bytes::Bytes` before clone so it will be a refcounted instead of refcounting a vec clone later on. Also fixes one cancellation token cloning I had missed in #6618. Cc: #6096

…6696) This PR is preliminary cleanups and refactoring around `remote_storage` for next PR which will move the timeouts and cancellation into `remote_storage`. Summary: - smaller drive-by fixes - code simplification - refactor common parts like `DownloadError::is_permanent` - align error types with `RemoteStorage::list_*` to use more `download_retry` helper Cc: #6096

jcsp · 2024-02-12T11:05:54Z

PR in flight: #6697

Cancellation and timeouts are handled at remote_storage callsites, if they are. However they should always be handled, because we've had transient problems with remote storage connections. - Add cancellation token to the `trait RemoteStorage` methods - For `download*`, `list*` methods there is `DownloadError::{Cancelled,Timeout}` - For the rest now using `anyhow::Error`, it will have root cause `remote_storage::TimeoutOrCancel::{Cancel,Timeout}` - Both types have `::is_permanent` equivalent which should be passed to `backoff::retry` - New generic RemoteStorageConfig option `timeout`, defaults to 120s - Start counting timeouts only after acquiring concurrency limiter permit - Cancellable permit acquiring - Download stream timeout or cancellation is communicated via an `std::io::Error` - Exit backoff::retry by marking cancellation errors permanent Fixes: #6096 Closes: #4781 Co-authored-by: arpad-m <[email protected]>

As noticed in #6836 some occurances of error conversions were missed in #6697: - `std::io::Error` popped up by `tokio::io::copy_buf` containing `DownloadError` was turned into `DownloadError::Other` - similarly for secondary downloader errors These changes come at the loss of pathname context. Cc: #6096

jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Dec 11, 2023

This was referenced Dec 11, 2023

Epic: cancellation in long running pageserver tasks #5585

Closed

pageserver: basic cancel/timeout for remote storage operations #6097

Merged

jcsp mentioned this issue Jan 5, 2024

storage_sync2: downloads should cancel in response to task_mgr shutdown #2996

Closed

koivunej self-assigned this Feb 5, 2024

koivunej mentioned this issue Feb 5, 2024

refactor: needless cancellation token cloning #6618

Merged

koivunej mentioned this issue Feb 6, 2024

uploader: avoid cloning vecs just to get Bytes #6645

Merged

This was referenced Feb 9, 2024

prepare to move timeouts and cancellation handling to remote_storage #6696

Merged

move timeouts and cancellation handling to remote_storage #6697

Merged

koivunej closed this as completed in #6697 Feb 14, 2024

koivunej mentioned this issue Feb 20, 2024

fix: remaining missed cancellations and timeouts #6843

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: more elegant cancellation for remote operations #6096

pageserver: more elegant cancellation for remote operations #6096

jcsp commented Dec 11, 2023

jcsp commented Dec 11, 2023

koivunej commented Feb 5, 2024 •

edited

Loading

jcsp commented Feb 5, 2024

jcsp commented Feb 5, 2024

jcsp commented Feb 12, 2024

pageserver: more elegant cancellation for remote operations #6096

pageserver: more elegant cancellation for remote operations #6096

Comments

jcsp commented Dec 11, 2023

jcsp commented Dec 11, 2023

koivunej commented Feb 5, 2024 • edited Loading

jcsp commented Feb 5, 2024

jcsp commented Feb 5, 2024

jcsp commented Feb 12, 2024

koivunej commented Feb 5, 2024 •

edited

Loading