Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

per-TenantShard read throttling #6706

Merged
merged 28 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
06a17a1
WIP
problame Feb 5, 2024
c9f1bab
WIP
problame Feb 5, 2024
26ea5f3
few todo's left
problame Feb 5, 2024
4ee5489
finish impl & fmt
problame Feb 5, 2024
1e7b9ce
TenantConfOpt <=> models::TenantConfig conversion prevents us from ha…
problame Feb 5, 2024
f3e000e
pagebench: support num_clients
problame Feb 5, 2024
a516735
make it work
problame Feb 5, 2024
c9eb808
make tests compile
problame Feb 5, 2024
44060f4
Merge remote-tracking branch 'origin/main' into problame/get-page-thr…
problame Feb 7, 2024
57a8ec3
Merge branch 'main' into problame/get-page-throttling/wip
problame Feb 9, 2024
3abbd69
metric
problame Feb 13, 2024
de1db01
Merge remote-tracking branch 'origin/main' into problame/get-page-thr…
problame Feb 13, 2024
d7515c8
pagebench: rate limiter fixes part 1
problame Feb 13, 2024
de3722a
pagebench: working limit
problame Feb 13, 2024
57a5640
finish cleaning up pagebench; cargo fmt
problame Feb 13, 2024
0b4df46
refine typing & names of per-tenant metric
problame Feb 13, 2024
9f46fa8
global metric; at 2x rate (60k RPS total), 1.6% of CPU time of PS is …
problame Feb 13, 2024
3ccad2b
WIP: store last_throttled_at behind Mutex; too expensive (2.7% total …
problame Feb 13, 2024
f44d681
report throttling of individual tenants in the log; at 2x rate, throt…
problame Feb 14, 2024
451eefe
drop the per-tenant metric, add additional global metrics
problame Feb 14, 2024
ab88016
manual benchmarking: 2.2
problame Feb 14, 2024
aa8dcaa
fix test
problame Feb 14, 2024
d60a661
clippy
problame Feb 14, 2024
ddbc40b
address most review comments & suggestions
problame Feb 16, 2024
a3230cd
fixups & rename to timeline_get_throttle
problame Feb 16, 2024
030e861
address more tiny review comments
problame Feb 16, 2024
4c189a8
apply the suggestion about using unconstrained
problame Feb 16, 2024
23baada
Merge branch 'main' into problame/get-page-throttling/wip
problame Feb 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ ipnet = "2.9.0"
itertools = "0.10"
jsonwebtoken = "9"
lasso = "0.7"
leaky-bucket = "1.0.1"
libc = "0.2"
md5 = "0.7.0"
memoffset = "0.8"
Expand Down
10 changes: 10 additions & 0 deletions control_plane/src/pageserver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,11 @@ impl PageServerNode {
.map(|x| x.parse::<bool>())
.transpose()
.context("Failed to parse 'lazy_slru_download' as bool")?,
timeline_get_throttle: settings
.remove("timeline_get_throttle")
.map(serde_json::from_str)
.transpose()
.context("parse `timeline_get_throttle` from json")?,
};
if !settings.is_empty() {
bail!("Unrecognized tenant settings: {settings:?}")
Expand Down Expand Up @@ -505,6 +510,11 @@ impl PageServerNode {
.map(|x| x.parse::<bool>())
.transpose()
.context("Failed to parse 'lazy_slru_download' as bool")?,
timeline_get_throttle: settings
.remove("timeline_get_throttle")
.map(serde_json::from_str)
.transpose()
.context("parse `timeline_get_throttle` from json")?,
}
};

Expand Down
30 changes: 30 additions & 0 deletions libs/pageserver_api/src/models.rs
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ pub struct TenantConfig {
pub gc_feedback: Option<bool>,
pub heatmap_period: Option<String>,
pub lazy_slru_download: Option<bool>,
pub timeline_get_throttle: Option<ThrottleConfig>,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
Expand All @@ -309,6 +310,35 @@ pub struct EvictionPolicyLayerAccessThreshold {
pub threshold: Duration,
}

#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
pub struct ThrottleConfig {
pub task_kinds: Vec<String>, // TaskKind
pub initial: usize,
#[serde(with = "humantime_serde")]
pub refill_interval: Duration,
pub refill_amount: NonZeroUsize,
pub max: usize,
pub fair: bool,
}

impl ThrottleConfig {
pub fn disabled() -> Self {
Self {
task_kinds: vec![], // effectively disables the throttle
// other values don't matter with emtpy `task_kinds`.
initial: 0,
refill_interval: Duration::from_millis(1),
refill_amount: NonZeroUsize::new(1).unwrap(),
max: 1,
fair: true,
}
}
/// The requests per second allowed by the given config.
pub fn steady_rps(&self) -> f64 {
(self.refill_amount.get() as f64) / (self.refill_interval.as_secs_f64()) / 1e3
}
}

/// A flattened analog of a `pagesever::tenant::LocationMode`, which
/// lists out all possible states (and the virtual "Detached" state)
/// in a flat form rather than using rust-style enums.
Expand Down
1 change: 1 addition & 0 deletions libs/utils/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ hyper = { workspace = true, features = ["full"] }
fail.workspace = true
futures = { workspace = true}
jsonwebtoken.workspace = true
leaky-bucket.workspace = true
nix.workspace = true
once_cell.workspace = true
pin-project-lite.workspace = true
Expand Down
4 changes: 3 additions & 1 deletion pageserver/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ testing = ["fail/failpoints"]

[dependencies]
anyhow.workspace = true
arc-swap.workspace = true
async-compression.workspace = true
async-stream.workspace = true
async-trait.workspace = true
Expand All @@ -35,6 +36,7 @@ humantime.workspace = true
humantime-serde.workspace = true
hyper.workspace = true
itertools.workspace = true
leaky-bucket.workspace = true
md5.workspace = true
nix.workspace = true
# hack to get the number of worker threads tokio uses
Expand Down Expand Up @@ -82,7 +84,7 @@ workspace_hack.workspace = true
reqwest.workspace = true
rpds.workspace = true
enum-map.workspace = true
enumset.workspace = true
enumset = { workspace = true, features = ["serde"]}
strum.workspace = true
strum_macros.workspace = true

Expand Down
Loading
Loading