Instrument RustyVault with Prometheus #76

cybershang · 2024-09-17T00:38:37Z

Instrument RustyVault with Prometheus

Design

To monitor the system's performance effectively, I applied both the USE and RED methods for metrics collection in RustyVault.

USE Method (Utilization, Saturation, Errors):

Track resource utilization and detect bottlenecks. Metrics related to system resources have been added to ensure the system's health is continuously monitored:
- CPU Utilization: Measures the percentage of CPU usage by the RustyVault service.
- Memory Utilization: Tracks memory usage, including total, free, and cached memory.
- Disk I/O Saturation: Monitors disk read/write speed and detects potential bottlenecks.
- Network I/O Saturation: Tracks the amount of data sent and received.
RED Method (Rate, Errors, Duration)

Track the behavior of requests within the application:
- Rate: We implemented requests_total to track the rate of requests coming into the system. This allows us to monitor the overall throughput.
- Errors: The errors_total counter tracks the number of failed requests and helps monitor the system's error rate.
- Duration: Using request_duration_seconds, we measure the time taken to process each request, enabling us to analyze latency and potential performance issues.

Implemented Metrics

System Metrics
- CPU
  - cpu_usage_percent: <Gauge, AtomicU64>
- Memory
  - total_memory: <Gauge, AtomicU64>
  - used_memory: <Gauge, AtomicU64>
  - free_memory: <Gauge, AtomicU64>
- Disk
  - total_disk_space： <Gauge, AtomicU64>
  - total_disk_available: <Gauge, AtomicU64>
- Network
  - network_in_bytes: <Gauge, AtomicU64>
  - network_out_bytes: <Gauge, AtomicU64>
- Load
  - load_average:
HTTP Request Metrics
- struct HttpLabel {path:String, method:MetricsMethod, status:u16}
- http_request_count: Family<HttpLabel, Counter>
- http_request_duration_seconds: Family<HttpLabel, Histogram>

Changes

Dependency Imported

prometheus-client = "0.22.3"
tokio = "1.40.0"
sysinfo = "0.31.4"

MetricsManager Implementation:

Implemented MetricsManager in manager.rs to store Prometheus Registry, system metrics (system_metrics), and HTTP API metrics (http_metrics).
Integrated metrics_manager into the server in src/cli/command/server.rs by inserting it into app_data.

Implemented metrics_handler:

Implemented init_metrics_service in metrics.rs, Sets up the /metrics service by configuring a route in the ServiceConfig.
Associates the /metrics route with metrics_handler to handle GET requests and respond with Prometheus metrics in text format.

System Metrics Collection:
- Implemented SystemMetrics struct in system_metrics.rs to gather CPU, memory, load, and disk metrics using the sysinfo crate.
- Added collect_metrics function to collect and store system information.
- Launched the start_collecting method in server.block_on to periodically collect system metrics.
HTTP Middleware:
- Implemented MetricsMiddleware in middleware.rs as a function middleware to capture HTTP request metrics.
- Configured the HTTP server in src/cli/command/server.rs to apply the middleware using .wrap(from_fn(metrics_middleware)).
- Transformed Actix-web's HTTP methods into a custom MetricsMethod enum, tracking GET, POST, PUT, DELETE, and categorizing others as OTHER.
- Recorded request duration by logging start and end timestamps for each request.
HTTP Metrics:
- Created HttpMetrics struct in http_metrics.rs to handle HTTP request counting and duration observation.
- Registered two Prometheus metrics: requests counter and histogram for request durations.
- Added methods increment_request_count and observe_duration for tracking requests and their durations, labeled by HTTP method and path.

Testing Steps

Start RustyVault Service:
- Ensure that Prometheus integration is enabled in the configuration.
Access Metrics Endpoint:
- Open a browser or use curl to visit http://localhost:<PORT>/metrics.
- Verify that Prometheus metrics are correctly displayed.
Trigger Various Requests:
- Successful Requests:
  - Send valid requests to endpoints like /login and /register.
  - Confirm that requests_total and request_duration_seconds increment appropriately.
- Failed Requests:
  - Send invalid or malformed requests to induce errors.
  - Check that errors_total increments accordingly.
Integrate with Prometheus Server:
- Add RustyVault's /metrics endpoint to the Prometheus configuration.
Using Grafana Dashboard:

Use a Grafana dashboard to visualize the collected metrics and demonstrate the data.

CLAassistant · 2024-09-17T00:38:42Z

All committers have signed the CLA.

src/cli/command/server.rs

src/http/metrics.rs

src/metrics/http_metrics.rs

src/metrics/system_metrics.rs

src/metrics/middleware.rs

src/cli/config.rs

wa5i · 2024-09-20T07:33:13Z

There are conflicts in three files in the pull request, they need to be resolved.
test case is missing.

cybershang · 2024-09-23T08:32:56Z

There are conflicts in three files in the pull request, they need to be resolved.

test case is missing.

Conflicts resolved

…later.

wa5i · 2024-10-09T02:53:59Z

The Windows test case has failed.

cybershang · 2024-10-09T06:24:54Z

The Windows test case has failed.

The metric load average captured by sysinfo is not available on Windows platform.

In system_metric.rs test case, skip the assert of load average on Windows platform.

        // load average is not available on Windows
        if cfg!(target_os = "windows") {
            gauge_map.remove("load_average");
        }

…on Windows platform.

Instrument RustyVault with Prometheus

8b0afd9

wa5i requested changes Sep 18, 2024

View reviewed changes

cybershang added 4 commits September 18, 2024 12:13

Format server.rs and http_metrics.rs

0650b99

Replace eprintln! with log::error!

ed1043c

Add support for setting data collection interval from config file

c44e119

Add LIST operation to http metrics.

0e64a28

wa5i requested changes Sep 20, 2024

View reviewed changes

src/metrics/middleware.rs Show resolved Hide resolved

src/cli/config.rs Show resolved Hide resolved

cybershang added 3 commits September 23, 2024 15:06

Add default value to collection_interval.

fd29cf3

Add LIST operation in metrics middleware.

7fcceb8

Merge branch 'main' into ospp-yingjie

4562a3b

cybershang added 8 commits September 25, 2024 09:26

Merge branch 'main' into ospp-yingjie

82d4b55

Using constant to hold string content.

33a4b4e

Add pub modifier to consts

ce27ebe

Network data stay zero all the time. Comment out temporarily, fix it …

5e60fec

…later.

Add test cases to verify Prometheus instrumentation.

b748368

Move testcases into separate file.

8589fdf

Add docs for files.

bccb31d

Add 'text' modifier in code part of doc.

8190c22

Since load_avg captured by sysinfo is not available Windows, skip it …

b1ae7b6

…on Windows platform.

wa5i merged commit e22637a into Tongsuo-Project:main Oct 9, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument RustyVault with Prometheus #76

Instrument RustyVault with Prometheus #76

cybershang commented Sep 17, 2024 •

edited

Loading

CLAassistant commented Sep 17, 2024 •

edited

Loading

wa5i commented Sep 20, 2024

cybershang commented Sep 23, 2024

wa5i commented Oct 9, 2024

cybershang commented Oct 9, 2024 •

edited

Loading

Instrument RustyVault with Prometheus #76

Instrument RustyVault with Prometheus #76

Conversation

cybershang commented Sep 17, 2024 • edited Loading

Instrument RustyVault with Prometheus

Design

Implemented Metrics

Changes

Testing Steps

CLAassistant commented Sep 17, 2024 • edited Loading

wa5i commented Sep 20, 2024

cybershang commented Sep 23, 2024

wa5i commented Oct 9, 2024

cybershang commented Oct 9, 2024 • edited Loading

cybershang commented Sep 17, 2024 •

edited

Loading

CLAassistant commented Sep 17, 2024 •

edited

Loading

cybershang commented Oct 9, 2024 •

edited

Loading