Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new statistics for tracking connections created and closed #202

Closed
wants to merge 2 commits into from

Conversation

pfreixes
Copy link
Contributor

@pfreixes pfreixes commented Jun 5, 2024

We add several new statistics fields for tracking the following:

  • connections_created: How many connections have been created
  • connections_broken_closed: How many connections were closed due to a broken state
  • connections_invalid_closed: How many connections were closed due to be considered invalid.
  • connections_max_lifetime_closed: How many connections were closed due to reaching the max lifetime.
  • connections_max_idle_timeout_closed: How many connections were closed due to reaching the idle timeout.

We have added also a new statistic called get_waited_time_micro which provides the accumulated time in micros that all calls to the get method had to wait for a new connection.

We add several fiels for tracking the following:
- connections_created: How many connections have been created
- connections_broken_closed: How many connections were closed due to a
  broken state
- connections_invalid_closed: How many connections were closed due to be
  considered invalid.
- connections_max_lifetime_closed: How many connections were closed due
  to reaching the max lifetime.
- connections_max_idle_timeout_closed: How many connections were closed
  due to reaching the idle timeout.

We have added also a new statistic called `get_waited_time_micro` which
provides the accumulated time in micros that all calls to the `get`
method had to wait for a new connection.
@pfreixes pfreixes changed the title Adds new statistics for tracking connectoins created and closed Adds new statistics for tracking connections created and closed Jun 5, 2024
Copy link

codecov bot commented Jun 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.15%. Comparing base (276235a) to head (58d5a7a).

Current head 58d5a7a differs from pull request most recent head b321557

Please upload reports for the commit b321557 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #202      +/-   ##
==========================================
+ Coverage   76.05%   79.15%   +3.10%     
==========================================
  Files           6        6              
  Lines         543      595      +52     
==========================================
+ Hits          413      471      +58     
+ Misses        130      124       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bb8/src/inner.rs Outdated
Comment on lines 117 to 120
self.inner
.statistics
.connections_invalid_closed
.fetch_add(1, Ordering::SeqCst);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just make this another StatsKind, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jaja, good point. Ill try to do that!

bb8/src/inner.rs Outdated
self.inner
.statistics
.get_waited_time_micro
.fetch_add(wait_time.as_micros() as u64, Ordering::SeqCst);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why micros vs millis or nanos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that micros would provide enough detail (nanos might be just noise) while we would keep the also the u64 large enough for storing many many many micros.

In any case since we move this to a Duration we will be storing behind the scenes IIRC micros using u128.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oks now I remembered, everything I wanted to use an Atomic Type. I guess that we can try to use an u128 internally and return a Duration (which IIRC is just a u128). If this works we could just forget about the discussion between nanos and micros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I went for using micros with u64 for converting later to Duration. If you want to use nanos, i can do that. But got the feeling that nanosecond resolution for this specific use case would not provide a lot of value (indeed micro most likely will not neither)

Ive just kept the type to u64 just for having all atomics using the same type and helping me to keep the record interface simple.

If you want to to go for nanos and/or u128 let me know.

@@ -139,22 +147,34 @@ where
ApprovalIter { num: num as usize }
}

pub(crate) fn reap(&mut self, config: &Builder<M>) -> ApprovalIter {
pub(crate) fn reap(&mut self, config: &Builder<M>) -> (ApprovalIter, u64, u64) {
let mut max_lifetime_closed: u64 = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid type annotations where possible.

Comment on lines +279 to +283
pub(crate) connections_created: AtomicU64,
pub(crate) connections_broken_closed: AtomicU64,
pub(crate) connections_invalid_closed: AtomicU64,
pub(crate) connections_max_lifetime_closed: AtomicU64,
pub(crate) connections_max_idle_timeout_closed: AtomicU64,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip the connections_ prefixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of them? at least at first sight the connections_created, connections_broken_closed, connections_broken_closed and connections_invalid_closed without the connections_ prefix will seem a bit off.

Also it follows the same practice for the get_ Kind, where we prefixed it and IMO helps to the user to understand the scope of them, same would happen for connections_

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's keep connections_ but I'd like closed moved directly after connections_ and rename max_idle_timeout to idle_timeout.

}

impl AtomicStatistics {
pub(crate) fn record(&self, kind: StatsKind) {
pub(crate) fn record_get(&self, kind: StatsKind) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we can use it for incrementing other (non-get) counters, prefer to revert this.

bb8/src/api.rs Outdated
@@ -99,6 +99,20 @@ pub struct Statistics {
pub get_waited: u64,
/// Total gets performed that timed out while waiting for a connection.
pub get_timed_out: u64,
/// Total time accumulated waiting for a connection.
pub get_waited_time_micro: u64,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a std::time::Duration, and lose the _micro suffix.

@pfreixes
Copy link
Contributor Author

pfreixes commented Jun 7, 2024

@djc this one is ready for a second review.

Copy link
Owner

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I want a clean commit history with separate logical changes, so this should end up being in a different commit. I usually handle this by using interactive rebases but feel free to use separate PRs if that's simpler for you. I think we need the following changes:

  • Rename StatsKind to StatsGet
  • Rename record() to record_get()
  • Track wait_time: add wait_time variable in get() and change record_get() to accept the Option<Duration>, should end up in get_wait_time_micros in AtomicStatistics and in get_wait_time in Statistics
  • Track closed_broken and closed_created: recreate AtomicStatistics::record() and StatsKind for these
  • Track closed_max_lifetime and closed_idle_timeout: create a separate method AtomicStatistics::reaped() for these

@@ -85,7 +85,8 @@ where
}

pub(crate) async fn get(&self) -> Result<PooledConnection<'_, M>, RunError<M::Error>> {
let mut kind = StatsKind::Direct;
let mut get_kind = StatsKind::GetDirect;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't rename this get_kind, please.

Comment on lines +135 to +142
self.inner.statistics.record(get_kind, 1);

if let Some(wait_time_start) = wait_time_start {
let wait_time = Instant::now() - wait_time_start;
self.inner
.statistics
.record(StatsKind::GetWaitedTime, wait_time.as_micros() as u64);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wrap this up in a method on AtomicStatistics called record_get().

Comment on lines +55 to +60
self.statistics
.record(StatsKind::ConnectionsMaxLifeTimeClosed, max_lifetime_closed);
self.statistics.record(
StatsKind::ConnectionsMaxIdleTimeoutClosed,
max_idle_timeout_closed,
);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wrap this in a separate method on AtomicStatistics, too, and keep all of the reaping-related changes in a separate commit or PR.

Comment on lines +153 to +154
let mut max_lifetime_closed = 0;
let mut max_idle_timeout_closed = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call these closed_max_lifetime and closed_idle_timeout.

Comment on lines +279 to +283
pub(crate) connections_created: AtomicU64,
pub(crate) connections_broken_closed: AtomicU64,
pub(crate) connections_invalid_closed: AtomicU64,
pub(crate) connections_max_lifetime_closed: AtomicU64,
pub(crate) connections_max_idle_timeout_closed: AtomicU64,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's keep connections_ but I'd like closed moved directly after connections_ and rename max_idle_timeout to idle_timeout.

Direct,
Waited,
TimedOut,
GetDirect,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to split this into StatsGet { Waited, Direct, TimedOut }.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djc just for having an agreement on the new kinds and record operations that we would have within AtomicStatistics, we can go for this

pub(crate) enum StatsGet {
    Waited,
    Direct,
    TimedOut,
}

pub(crate) enum StatsConnectionClosed {
    Broken,
    Invalid,
}

Note: we should not need more stats kind since the used record_* methods for stats like the ones coming from the reap function will just call an ad-hoc function like record_connections_reaped

And then we would have the following record_* methods within the AtomicStatistics:

  • record_get (takes the StatsGet parameter)
  • record_connection_closed (takes the StatsConnectionClosed parameter)
  • record_connections_reaped
  • record_get_wait_time
  • record_connection_created

Let me know if this works for you and i can create a new PR with separated commits for each group, starting with the rename of the record to record_get

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as suggested in my previous message we should reuse a simple record() with StatsKind for both connection_closed and connection_created, and record_get() should also handle get_wait_time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oks, let me provide a new draft - sorry for the back and forth here :

pub(crate) enum StatsGet {
    Waited,
    Direct,
    TimedOut,
    WaitTime
}

pub(crate) enum StatsConnectionClosed {
    Broken,
    Invalid,
}

And following record methods

  • record which would take a StatsConnectionClosed kind type
  • record_get for recording any stats related to get including wait time
  • record_connections_reaped for recording the reaped connections either idle timeout or max lifetime

How does it sound? is this aligned with your expectations?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Let's do one PR at a time, please implement the first chunk and I'll give feedback. Please closely read my feedback from this PR because you've still missed some of it.

@djc djc closed this Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants