Skip to content

Commit

Permalink
Fix panic in compute_ctl metrics collection (#9831)
Browse files Browse the repository at this point in the history
Calling unwrap on the encoder is a little overzealous. One of the errors
that can be returned by the encode function in particular is the
non-existence of metrics for a metric family, so we should prematurely
filter instances like that out. I believe that the cause of this panic
was caused by a race condition between the prometheus collector and the
compute collecting the installed extensions metric for the first time.
The HTTP server is spawned on a separate thread before we even start
bringing up Postgres.

Signed-off-by: Tristan Partin <[email protected]>
  • Loading branch information
tristan957 authored Nov 21, 2024
1 parent 190e8ce commit 37962e7
Showing 1 changed file with 16 additions and 3 deletions.
19 changes: 16 additions & 3 deletions compute_tools/src/http/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ use anyhow::Result;
use hyper::header::CONTENT_TYPE;
use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Method, Request, Response, Server, StatusCode};
use metrics::proto::MetricFamily;
use metrics::Encoder;
use metrics::TextEncoder;
use tokio::task;
Expand Down Expand Up @@ -72,10 +73,22 @@ async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body
(&Method::GET, "/metrics") => {
debug!("serving /metrics GET request");

let mut buffer = vec![];
let metrics = installed_extensions::collect();
// When we call TextEncoder::encode() below, it will immediately
// return an error if a metric family has no metrics, so we need to
// preemptively filter out metric families with no metrics.
let metrics = installed_extensions::collect()
.into_iter()
.filter(|m| !m.get_metric().is_empty())
.collect::<Vec<MetricFamily>>();

let encoder = TextEncoder::new();
encoder.encode(&metrics, &mut buffer).unwrap();
let mut buffer = vec![];

if let Err(err) = encoder.encode(&metrics, &mut buffer) {
let msg = format!("error handling /metrics request: {err}");
error!(msg);
return render_json_error(&msg, StatusCode::INTERNAL_SERVER_ERROR);
}

match Response::builder()
.status(StatusCode::OK)
Expand Down

1 comment on commit 37962e7

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5626 tests run: 5387 passed, 3 failed, 236 skipped (full report)


Failures on Postgres 16

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_sharded_ingest[release-pg16-github-actions-selfhosted-1] or test_bulk_insert[neon-release-pg16-github-actions-selfhosted] or test_compaction_l0_memory[release-pg16-github-actions-selfhosted]"

Code coverage* (full report)

  • functions: 31.4% (7956 of 25343 functions)
  • lines: 49.3% (63112 of 127990 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
37962e7 at 2024-11-21T22:08:58.833Z :recycle:

Please sign in to comment.