Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ongoing: fix: consumption metrics on restart #5297

Closed
wants to merge 46 commits into from
Closed
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
0211061
refactor: introduce tenantsnapshot
koivunej Sep 12, 2023
7652148
refactor: introduce aliases
koivunej Sep 14, 2023
0e553b4
refactor: break dep to globals with collection
koivunej Sep 12, 2023
72ca2a2
refactor: simplify name
koivunej Sep 13, 2023
6ec5cfd
fixup? drive-by Event generification fix
koivunej Sep 13, 2023
a02a805
fix: proper exp backoff retry
koivunej Sep 13, 2023
870b0a1
refactor: idempotency key, make it write!-able
koivunej Sep 13, 2023
c0e6610
drive-by unrelated: refactor: avoid code in tokio::select
koivunej Sep 13, 2023
ff681b9
consumption_metrics: make Name generic
koivunej Sep 14, 2023
c515ed5
consumption_metric: add EventType::recorded_at
koivunej Sep 14, 2023
bfb84e0
remove ability to deduplicate events
koivunej Sep 13, 2023
6f0a9dd
test: serialized metric images
koivunej Sep 14, 2023
7651568
refactor: introduce enum Name for metric strings
koivunej Sep 13, 2023
fe4989a
refactor: cleanup timelinesnapshot
koivunej Sep 14, 2023
1e635fc
doc: metricskey
koivunej Sep 14, 2023
6f982e2
feat: file-backed cached metrics
koivunej Sep 13, 2023
d4de0f6
refactor: two line tenant snapshot usage
koivunej Sep 14, 2023
ec8d7e9
test: introduce tests for new metrics
koivunej Sep 14, 2023
faaef03
inplace updates for events
koivunej Sep 14, 2023
55977ba
test_metric_collection: calculate synthetic size faster
koivunej Sep 14, 2023
c7a8069
test_metric_collection: stateful object verifiers
koivunej Sep 14, 2023
48b10cd
test_metric_collection: spice up with a restart
koivunej Sep 14, 2023
9d044d0
test_metric_collection: assert synthetic_storage_size
koivunej Sep 14, 2023
91d0bd4
test_metric_collection: a bit more work
koivunej Sep 14, 2023
2dcc73c
chore: pyfmt
koivunej Sep 14, 2023
fef399e
test_metric_collection: another pass
koivunej Sep 14, 2023
485b91e
split test files
koivunej Sep 15, 2023
b90368f
chore: add minimal types
koivunej Sep 15, 2023
68c38da
refactor: split and split
koivunej Sep 15, 2023
26eace6
consumption_metrics: make Event deserializable
koivunej Sep 15, 2023
27d6cd3
test: chunked_serialization
koivunej Sep 15, 2023
7e8bbea
fix: cached_metric_collection_interval default
koivunej Sep 15, 2023
2c3cca1
test: fix test_threshold_based_eviction
koivunej Sep 15, 2023
d341f30
fix: UploadError debug == display for backoff::retry
koivunej Sep 15, 2023
98dd577
chore: one broken link
koivunej Sep 15, 2023
2d84a2b
test: remove extra lines
koivunej Sep 15, 2023
658956f
refactor: rename final_path => path
koivunej Sep 15, 2023
2fe8d44
test: file comment no longer needed
koivunej Sep 15, 2023
d35ded1
upload: log permanent errors
koivunej Sep 15, 2023
19867e4
refactor: clarify allocations being reused
koivunej Sep 15, 2023
00a9baa
refactor: needless cancellation token checking
koivunej Sep 15, 2023
0326bd4
fixup d35ded163
koivunej Sep 15, 2023
7b40d81
refactor: rewrite Iter as type to impl ExactSizeIterator
koivunej Sep 15, 2023
81ce97d
upload: send pageserver-metrics-last-upload-in-batch header
koivunej Sep 15, 2023
cbabcad
test: consume pageserver-metrics-last-upload-in-batch header
koivunej Sep 15, 2023
badfa15
chore: additional missed rustfmt..?
koivunej Sep 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 49 additions & 12 deletions libs/consumption_metrics/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
//!
use chrono::{DateTime, Utc};
use rand::Rng;
use serde::Serialize;
use serde::{Deserialize, Serialize};

#[derive(Serialize, Debug, Clone, Copy, Eq, PartialEq, Ord, PartialOrd)]
#[derive(Serialize, serde::Deserialize, Debug, Clone, Copy, Eq, PartialEq, Ord, PartialOrd)]
#[serde(tag = "type")]
pub enum EventType {
#[serde(rename = "absolute")]
Expand All @@ -27,7 +27,8 @@ impl EventType {
}

pub fn incremental_timerange(&self) -> Option<std::ops::Range<&DateTime<Utc>>> {
// these can most likely be thought of as Range or RangeFull
// these can most likely be thought of as Range or RangeFull, at least pageserver creates
// incremental ranges where the stop and next start are equal.
use EventType::*;
match self {
Incremental {
Expand All @@ -41,15 +42,25 @@ impl EventType {
pub fn is_incremental(&self) -> bool {
matches!(self, EventType::Incremental { .. })
}

/// Returns the absolute time, or for incremental ranges, the stop time.
pub fn recorded_at(&self) -> &DateTime<Utc> {
use EventType::*;

match self {
Absolute { time } => time,
Incremental { stop_time, .. } => stop_time,
}
}
}

#[derive(Serialize, Debug, Clone, Eq, PartialEq, Ord, PartialOrd)]
pub struct Event<Extra> {
#[derive(Serialize, Deserialize, Debug, Clone, Eq, PartialEq, Ord, PartialOrd)]
pub struct Event<Extra, Metric> {
#[serde(flatten)]
#[serde(rename = "type")]
pub kind: EventType,

pub metric: &'static str,
pub metric: Metric,
pub idempotency_key: String,
pub value: u64,

Expand All @@ -58,12 +69,38 @@ pub struct Event<Extra> {
}

pub fn idempotency_key(node_id: &str) -> String {
format!(
"{}-{}-{:04}",
Utc::now(),
node_id,
rand::thread_rng().gen_range(0..=9999)
)
IdempotencyKey::generate(node_id).to_string()
}

/// Downstream users will use these to detect upload retries.
pub struct IdempotencyKey<'a> {
now: chrono::DateTime<Utc>,
node_id: &'a str,
nonce: u16,
}

impl std::fmt::Display for IdempotencyKey<'_> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}-{}-{:04}", self.now, self.node_id, self.nonce)
}
}

impl<'a> IdempotencyKey<'a> {
pub fn generate(node_id: &'a str) -> Self {
IdempotencyKey {
now: Utc::now(),
node_id,
nonce: rand::thread_rng().gen_range(0..=9999),
}
}

pub fn for_tests(now: DateTime<Utc>, node_id: &'a str, nonce: u16) -> Self {
IdempotencyKey {
now,
node_id,
nonce,
}
}
}

pub const CHUNK_SIZE: usize = 1000;
Expand Down
2 changes: 1 addition & 1 deletion pageserver/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ enum-map.workspace = true
enumset.workspace = true
strum.workspace = true
strum_macros.workspace = true
tempfile.workspace = true

[dev-dependencies]
criterion.workspace = true
hex-literal.workspace = true
tempfile.workspace = true
tokio = { workspace = true, features = ["process", "sync", "fs", "rt", "io-util", "time", "test-util"] }

[[bench]]
Expand Down
4 changes: 4 additions & 0 deletions pageserver/src/bin/pageserver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -518,6 +518,9 @@ fn start_pageserver(
// creates a child context with the right DownloadBehavior.
DownloadBehavior::Error,
);

let local_disk_storage = conf.workdir.join("last_consumption_metrics.json");

task_mgr::spawn(
crate::BACKGROUND_RUNTIME.handle(),
TaskKind::MetricsCollection,
Expand All @@ -544,6 +547,7 @@ fn start_pageserver(
conf.cached_metric_collection_interval,
conf.synthetic_size_calculation_interval,
conf.id,
local_disk_storage,
metrics_ctx,
)
.instrument(info_span!("metrics_collection"))
Expand Down
2 changes: 1 addition & 1 deletion pageserver/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ pub mod defaults {
super::ConfigurableSemaphore::DEFAULT_INITIAL.get();

pub const DEFAULT_METRIC_COLLECTION_INTERVAL: &str = "10 min";
pub const DEFAULT_CACHED_METRIC_COLLECTION_INTERVAL: &str = "1 hour";
pub const DEFAULT_CACHED_METRIC_COLLECTION_INTERVAL: &str = "0s";
pub const DEFAULT_METRIC_COLLECTION_ENDPOINT: Option<reqwest::Url> = None;
pub const DEFAULT_SYNTHETIC_SIZE_CALCULATION_INTERVAL: &str = "10 min";
pub const DEFAULT_BACKGROUND_TASK_MAXIMUM_DELAY: &str = "10s";
Expand Down
Loading