Skip to content

Commit

Permalink
page_service: add benchmark for batching (#9820)
Browse files Browse the repository at this point in the history
This PR adds two benchmark to demonstrate the effect of server-side
getpage request batching added in
#9321.

For the CPU usage, I found the the `prometheus` crate's built-in CPU
usage accounts the seconds at integer granularity. That's not enough you
reduce the target benchmark runtime for local iteration. So, add a new
`libmetrics` metric and report that.

The benchmarks are disabled because [on our benchmark nodes, timer
resolution isn't high
enough](https://neondb.slack.com/archives/C059ZC138NR/p1732264223207449).
They work (no statement about quality) on my bare-metal devbox.

They will be refined and enabled once we find a fix. Candidates at time
of writing are:
- #9822
- #9851


Refs:

- Epic: #9376
- Extracted from #9792
  • Loading branch information
problame authored Nov 25, 2024
1 parent 441612c commit 5c23569
Show file tree
Hide file tree
Showing 3 changed files with 347 additions and 3 deletions.
40 changes: 38 additions & 2 deletions libs/metrics/src/more_process_metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,28 @@
// This module has heavy inspiration from the prometheus crate's `process_collector.rs`.

use once_cell::sync::Lazy;
use prometheus::Gauge;

use crate::UIntGauge;

pub struct Collector {
descs: Vec<prometheus::core::Desc>,
vmlck: crate::UIntGauge,
cpu_seconds_highres: Gauge,
}

const NMETRICS: usize = 1;
const NMETRICS: usize = 2;

static CLK_TCK_F64: Lazy<f64> = Lazy::new(|| {
let long = unsafe { libc::sysconf(libc::_SC_CLK_TCK) };
if long == -1 {
panic!("sysconf(_SC_CLK_TCK) failed");
}
let convertible_to_f64: i32 =
i32::try_from(long).expect("sysconf(_SC_CLK_TCK) is larger than i32");
convertible_to_f64 as f64
});

impl prometheus::core::Collector for Collector {
fn desc(&self) -> Vec<&prometheus::core::Desc> {
Expand All @@ -27,6 +41,12 @@ impl prometheus::core::Collector for Collector {
mfs.extend(self.vmlck.collect())
}
}
if let Ok(stat) = myself.stat() {
let cpu_seconds = stat.utime + stat.stime;
self.cpu_seconds_highres
.set(cpu_seconds as f64 / *CLK_TCK_F64);
mfs.extend(self.cpu_seconds_highres.collect());
}
mfs
}
}
Expand All @@ -43,7 +63,23 @@ impl Collector {
.cloned(),
);

Self { descs, vmlck }
let cpu_seconds_highres = Gauge::new(
"libmetrics_process_cpu_seconds_highres",
"Total user and system CPU time spent in seconds.\
Sub-second resolution, hence better than `process_cpu_seconds_total`.",
)
.unwrap();
descs.extend(
prometheus::core::Collector::desc(&cpu_seconds_highres)
.into_iter()
.cloned(),
);

Self {
descs,
vmlck,
cpu_seconds_highres,
}
}
}

Expand Down
3 changes: 2 additions & 1 deletion test_runner/performance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Some handy pytest flags for local development:
- `-k` selects a test to run
- `--timeout=0` disables our default timeout of 300s (see `setup.cfg`)
- `--preserve-database-files` to skip cleanup
- `--out-dir` to produce a JSON with the recorded test metrics

# What performance tests do we have and how we run them

Expand All @@ -36,6 +37,6 @@ All tests run only once. Usually to obtain more consistent performance numbers,

## Results collection

Local test results for main branch, and results of daily performance tests, are stored in a neon project deployed in production environment. There is a Grafana dashboard that visualizes the results. Here is the [dashboard](https://observer.zenith.tech/d/DGKBm9Jnz/perf-test-results?orgId=1). The main problem with it is the unavailability to point at particular commit, though the data for that is available in the database. Needs some tweaking from someone who knows Grafana tricks.
Local test results for main branch, and results of daily performance tests, are stored in a [neon project](https://console.neon.tech/app/projects/withered-sky-69117821) deployed in production environment. There is a Grafana dashboard that visualizes the results. Here is the [dashboard](https://observer.zenith.tech/d/DGKBm9Jnz/perf-test-results?orgId=1). The main problem with it is the unavailability to point at particular commit, though the data for that is available in the database. Needs some tweaking from someone who knows Grafana tricks.

There is also an inconsistency in test naming. Test name should be the same across platforms, and results can be differentiated by the platform field. But currently, platform is sometimes included in test name because of the way how parametrization works in pytest. I.e. there is a platform switch in the dashboard with neon-local-ci and neon-staging variants. I.e. some tests under neon-local-ci value for a platform switch are displayed as `Test test_runner/performance/test_bulk_insert.py::test_bulk_insert[vanilla]` and `Test test_runner/performance/test_bulk_insert.py::test_bulk_insert[neon]` which is highly confusing.
307 changes: 307 additions & 0 deletions test_runner/performance/pageserver/test_pageserver_getpage_merge.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,307 @@
import dataclasses
import json
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any

import pytest
from fixtures.benchmark_fixture import MetricReport, NeonBenchmarker
from fixtures.log_helper import log
from fixtures.neon_fixtures import NeonEnvBuilder, PgBin, wait_for_last_flush_lsn
from fixtures.utils import humantime_to_ms

TARGET_RUNTIME = 60


@pytest.mark.skip("See https://github.com/neondatabase/neon/pull/9820#issue-2675856095")
@pytest.mark.parametrize(
"tablesize_mib, batch_timeout, target_runtime, effective_io_concurrency, readhead_buffer_size, name",
[
# the next 4 cases demonstrate how not-batchable workloads suffer from batching timeout
(50, None, TARGET_RUNTIME, 1, 128, "not batchable no batching"),
(50, "10us", TARGET_RUNTIME, 1, 128, "not batchable 10us timeout"),
(50, "1ms", TARGET_RUNTIME, 1, 128, "not batchable 1ms timeout"),
# the next 4 cases demonstrate how batchable workloads benefit from batching
(50, None, TARGET_RUNTIME, 100, 128, "batchable no batching"),
(50, "10us", TARGET_RUNTIME, 100, 128, "batchable 10us timeout"),
(50, "100us", TARGET_RUNTIME, 100, 128, "batchable 100us timeout"),
(50, "1ms", TARGET_RUNTIME, 100, 128, "batchable 1ms timeout"),
],
)
def test_getpage_merge_smoke(
neon_env_builder: NeonEnvBuilder,
zenbenchmark: NeonBenchmarker,
tablesize_mib: int,
batch_timeout: str | None,
target_runtime: int,
effective_io_concurrency: int,
readhead_buffer_size: int,
name: str,
):
"""
Do a bunch of sequential scans and ensure that the pageserver does some merging.
"""

#
# record perf-related parameters as metrics to simplify processing of results
#
params: dict[str, tuple[float | int, dict[str, Any]]] = {}

params.update(
{
"tablesize_mib": (tablesize_mib, {"unit": "MiB"}),
"batch_timeout": (
-1 if batch_timeout is None else 1e3 * humantime_to_ms(batch_timeout),
{"unit": "us"},
),
# target_runtime is just a polite ask to the workload to run for this long
"effective_io_concurrency": (effective_io_concurrency, {}),
"readhead_buffer_size": (readhead_buffer_size, {}),
# name is not a metric
}
)

log.info("params: %s", params)

for param, (value, kwargs) in params.items():
zenbenchmark.record(
param,
metric_value=value,
unit=kwargs.pop("unit", ""),
report=MetricReport.TEST_PARAM,
**kwargs,
)

#
# Setup
#

env = neon_env_builder.init_start()
ps_http = env.pageserver.http_client()
endpoint = env.endpoints.create_start("main")
conn = endpoint.connect()
cur = conn.cursor()

cur.execute("SET max_parallel_workers_per_gather=0") # disable parallel backends
cur.execute(f"SET effective_io_concurrency={effective_io_concurrency}")
cur.execute(
f"SET neon.readahead_buffer_size={readhead_buffer_size}"
) # this is the current default value, but let's hard-code that

cur.execute("CREATE EXTENSION IF NOT EXISTS neon;")
cur.execute("CREATE EXTENSION IF NOT EXISTS neon_test_utils;")

log.info("Filling the table")
cur.execute("CREATE TABLE t (data char(1000)) with (fillfactor=10)")
tablesize = tablesize_mib * 1024 * 1024
npages = tablesize // (8 * 1024)
cur.execute("INSERT INTO t SELECT generate_series(1, %s)", (npages,))
# TODO: can we force postgres to do sequential scans?

#
# Run the workload, collect `Metrics` before and after, calculate difference, normalize.
#

@dataclass
class Metrics:
time: float
pageserver_getpage_count: float
pageserver_vectored_get_count: float
compute_getpage_count: float
pageserver_cpu_seconds_total: float

def __sub__(self, other: "Metrics") -> "Metrics":
return Metrics(
time=self.time - other.time,
pageserver_getpage_count=self.pageserver_getpage_count
- other.pageserver_getpage_count,
pageserver_vectored_get_count=self.pageserver_vectored_get_count
- other.pageserver_vectored_get_count,
compute_getpage_count=self.compute_getpage_count - other.compute_getpage_count,
pageserver_cpu_seconds_total=self.pageserver_cpu_seconds_total
- other.pageserver_cpu_seconds_total,
)

def normalize(self, by) -> "Metrics":
return Metrics(
time=self.time / by,
pageserver_getpage_count=self.pageserver_getpage_count / by,
pageserver_vectored_get_count=self.pageserver_vectored_get_count / by,
compute_getpage_count=self.compute_getpage_count / by,
pageserver_cpu_seconds_total=self.pageserver_cpu_seconds_total / by,
)

def get_metrics() -> Metrics:
with conn.cursor() as cur:
cur.execute(
"select value from neon_perf_counters where metric='getpage_wait_seconds_count';"
)
compute_getpage_count = cur.fetchall()[0][0]
pageserver_metrics = ps_http.get_metrics()
return Metrics(
time=time.time(),
pageserver_getpage_count=pageserver_metrics.query_one(
"pageserver_smgr_query_seconds_count", {"smgr_query_type": "get_page_at_lsn"}
).value,
pageserver_vectored_get_count=pageserver_metrics.query_one(
"pageserver_get_vectored_seconds_count", {"task_kind": "PageRequestHandler"}
).value,
compute_getpage_count=compute_getpage_count,
pageserver_cpu_seconds_total=pageserver_metrics.query_one(
"libmetrics_process_cpu_seconds_highres"
).value,
)

def workload() -> Metrics:
start = time.time()
iters = 0
while time.time() - start < target_runtime or iters < 2:
log.info("Seqscan %d", iters)
if iters == 1:
# round zero for warming up
before = get_metrics()
cur.execute(
"select clear_buffer_cache()"
) # TODO: what about LFC? doesn't matter right now because LFC isn't enabled by default in tests
cur.execute("select sum(data::bigint) from t")
assert cur.fetchall()[0][0] == npages * (npages + 1) // 2
iters += 1
after = get_metrics()
return (after - before).normalize(iters - 1)

env.pageserver.patch_config_toml_nonrecursive({"server_side_batch_timeout": batch_timeout})
env.pageserver.restart()
metrics = workload()

log.info("Results: %s", metrics)

#
# Sanity-checks on the collected data
#
# assert that getpage counts roughly match between compute and ps
assert metrics.pageserver_getpage_count == pytest.approx(
metrics.compute_getpage_count, rel=0.01
)

#
# Record the results
#

for metric, value in dataclasses.asdict(metrics).items():
zenbenchmark.record(f"counters.{metric}", value, unit="", report=MetricReport.TEST_PARAM)

zenbenchmark.record(
"perfmetric.batching_factor",
metrics.pageserver_getpage_count / metrics.pageserver_vectored_get_count,
unit="",
report=MetricReport.HIGHER_IS_BETTER,
)


@pytest.mark.skip("See https://github.com/neondatabase/neon/pull/9820#issue-2675856095")
@pytest.mark.parametrize(
"batch_timeout", [None, "10us", "20us", "50us", "100us", "200us", "500us", "1ms"]
)
def test_timer_precision(
neon_env_builder: NeonEnvBuilder,
zenbenchmark: NeonBenchmarker,
pg_bin: PgBin,
batch_timeout: str | None,
):
"""
Determine the batching timeout precision (mean latency) and tail latency impact.
The baseline is `None`; an ideal batching timeout implementation would increase
the mean latency by exactly `batch_timeout`.
That is not the case with the current implementation, will be addressed in future changes.
"""

#
# Setup
#

def patch_ps_config(ps_config):
ps_config["server_side_batch_timeout"] = batch_timeout

neon_env_builder.pageserver_config_override = patch_ps_config

env = neon_env_builder.init_start()
endpoint = env.endpoints.create_start("main")
conn = endpoint.connect()
cur = conn.cursor()

cur.execute("SET max_parallel_workers_per_gather=0") # disable parallel backends
cur.execute("SET effective_io_concurrency=1")

cur.execute("CREATE EXTENSION IF NOT EXISTS neon;")
cur.execute("CREATE EXTENSION IF NOT EXISTS neon_test_utils;")

log.info("Filling the table")
cur.execute("CREATE TABLE t (data char(1000)) with (fillfactor=10)")
tablesize = 50 * 1024 * 1024
npages = tablesize // (8 * 1024)
cur.execute("INSERT INTO t SELECT generate_series(1, %s)", (npages,))
# TODO: can we force postgres to do sequential scans?

cur.close()
conn.close()

wait_for_last_flush_lsn(env, endpoint, env.initial_tenant, env.initial_timeline)

endpoint.stop()

for sk in env.safekeepers:
sk.stop()

#
# Run single-threaded pagebench (TODO: dedup with other benchmark code)
#

env.pageserver.allowed_errors.append(
# https://github.com/neondatabase/neon/issues/6925
r".*query handler for.*pagestream.*failed: unexpected message: CopyFail during COPY.*"
)

ps_http = env.pageserver.http_client()

cmd = [
str(env.neon_binpath / "pagebench"),
"get-page-latest-lsn",
"--mgmt-api-endpoint",
ps_http.base_url,
"--page-service-connstring",
env.pageserver.connstr(password=None),
"--num-clients",
"1",
"--runtime",
"10s",
]
log.info(f"command: {' '.join(cmd)}")
basepath = pg_bin.run_capture(cmd, with_command_header=False)
results_path = Path(basepath + ".stdout")
log.info(f"Benchmark results at: {results_path}")

with open(results_path) as f:
results = json.load(f)
log.info(f"Results:\n{json.dumps(results, sort_keys=True, indent=2)}")

total = results["total"]

metric = "latency_mean"
zenbenchmark.record(
metric,
metric_value=humantime_to_ms(total[metric]),
unit="ms",
report=MetricReport.LOWER_IS_BETTER,
)

metric = "latency_percentiles"
for k, v in total[metric].items():
zenbenchmark.record(
f"{metric}.{k}",
metric_value=humantime_to_ms(v),
unit="ms",
report=MetricReport.LOWER_IS_BETTER,
)

1 comment on commit 5c23569

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6959 tests run: 6627 passed, 0 failed, 332 skipped (full report)


Flaky tests (3)

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.0% (7974 of 25721 functions)
  • lines: 48.8% (63292 of 129732 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
5c23569 at 2024-11-25T19:21:24.918Z :recycle:

Please sign in to comment.