Skip to content

Commit

Permalink
proxy: Demote all cplane error replies to info log level (#9880)
Browse files Browse the repository at this point in the history
## Problem

The vast majority of the error/warn logs from cplane are about time or
data transfer quotas exceeded or endpoint-not-found errors and not
operational errors in proxy or cplane.

## Summary of changes

* Demote cplane error replies to info level.
* Raise other errors from warn back to error.
  • Loading branch information
cloneable authored Nov 25, 2024
1 parent 7a2f0ed commit 87e4dd2
Showing 1 changed file with 16 additions and 4 deletions.
20 changes: 16 additions & 4 deletions proxy/src/proxy/wake_compute.rs
Original file line number Diff line number Diff line change
@@ -1,16 +1,28 @@
use tracing::{error, info, warn};
use tracing::{error, info};

use super::connect_compute::ComputeConnectBackend;
use crate::config::RetryConfig;
use crate::context::RequestContext;
use crate::control_plane::errors::WakeComputeError;
use crate::control_plane::errors::{ControlPlaneError, WakeComputeError};
use crate::control_plane::CachedNodeInfo;
use crate::error::ReportableError;
use crate::metrics::{
ConnectOutcome, ConnectionFailuresBreakdownGroup, Metrics, RetriesMetricGroup, RetryType,
};
use crate::proxy::retry::{retry_after, should_retry};

// Use macro to retain original callsite.
macro_rules! log_wake_compute_error {
(error = ?$error:expr, $num_retries:expr, retriable = $retriable:literal) => {
match $error {
WakeComputeError::ControlPlane(ControlPlaneError::Message(_)) => {
info!(error = ?$error, num_retries = $num_retries, retriable = $retriable, "couldn't wake compute node")
}
_ => error!(error = ?$error, num_retries = $num_retries, retriable = $retriable, "couldn't wake compute node"),
}
};
}

pub(crate) async fn wake_compute<B: ComputeConnectBackend>(
num_retries: &mut u32,
ctx: &RequestContext,
Expand All @@ -20,7 +32,7 @@ pub(crate) async fn wake_compute<B: ComputeConnectBackend>(
loop {
match api.wake_compute(ctx).await {
Err(e) if !should_retry(&e, *num_retries, config) => {
error!(error = ?e, num_retries, retriable = false, "couldn't wake compute node");
log_wake_compute_error!(error = ?e, num_retries, retriable = false);
report_error(&e, false);
Metrics::get().proxy.retries_metric.observe(
RetriesMetricGroup {
Expand All @@ -32,7 +44,7 @@ pub(crate) async fn wake_compute<B: ComputeConnectBackend>(
return Err(e);
}
Err(e) => {
warn!(error = ?e, num_retries, retriable = true, "couldn't wake compute node");
log_wake_compute_error!(error = ?e, num_retries, retriable = true);
report_error(&e, true);
}
Ok(n) => {
Expand Down

1 comment on commit 87e4dd2

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6941 tests run: 6632 passed, 1 failed, 308 skipped (full report)


Failures on Postgres 14

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_pull_timeline[release-pg14-True]"
Flaky tests (1)

Postgres 17

Test coverage report is not available

The comment gets automatically updated with the latest test results
87e4dd2 at 2024-11-25T20:21:48.759Z :recycle:

Please sign in to comment.