Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
safekeeper,pageserver: add CPU profiling (#9764)
## Problem We don't have a convenient way to gather CPU profiles from a running binary, e.g. during production incidents or end-to-end benchmarks, nor during microbenchmarks (particularly on macOS). We would also like to have continuous profiling in production, likely using [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/). We may choose to use either eBPF profiles or pprof profiles for this (pending testing and discussion with SREs), but pprof profiles appear useful regardless for the reasons listed above. See neondatabase/cloud#14888. This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches #9534. Touches neondatabase/cloud#14888. ## Summary of changes Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns it. Defaults to a 5-second pprof Protobuf profile for use with e.g. `pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query parameters: * `format`: output format (`pprof` or `svg`) * `frequency`: sampling frequency in microseconds (default 100) * `seconds`: number of seconds to profile (default 5) Also integrates pprof profiles into Criterion benchmarks, such that flamegraph reports can be taken with `cargo bench ... --profile-duration <seconds>`. Output under `target/criterion/*/profile/flamegraph.svg`. Example profiles: * pprof profile (use [`pprof`](https://github.com/google/pprof)): [profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz) * Web interface: `pprof -http :6060 profile.pb.gz` * Interactive flamegraph: [profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)
- Loading branch information
190e8ce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5626 tests run: 5387 passed, 3 failed, 236 skipped (full report)
Failures on Postgres 16
test_sharded_ingest[github-actions-selfhosted-1]
: release-x86-64test_bulk_insert[neon-github-actions-selfhosted]
: release-x86-64test_compaction_l0_memory[github-actions-selfhosted]
: release-x86-64Flaky tests (1)
Postgres 17
test_lr_with_slow_safekeeper
: release-arm64Code coverage* (full report)
functions
:31.4% (7954 of 25342 functions)
lines
:49.3% (63103 of 127982 lines)
* collected from Rust tests only
190e8ce at 2024-11-21T20:40:49.244Z :recycle: