Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query percentile support to stats #1112

Merged
merged 4 commits into from
Mar 1, 2024
Merged

Conversation

athoscouto
Copy link
Contributor

This uses hdrhistogram to record p50, p75, p90, p95, p99, and p99.9 of query latency.

If it works well, we may decide to add it per query too.


hdrhistogram uses constant memory given its range and sigfig parameter.
So usage does not go up as we add more records.

I've decided not to use the bounded version of the histogram because even though our latency should be below hundreds of thousands of milliseconds, I didn't want to risk an error or panic.

Some benchmarks around memory usage:

memory footprint with sigfig 2 and no records: ~1024
memory footprint with sigfig 2 and records between 0 and 1000: ~2048
memory footprint with sigfig 2 and records between 0 and 5000: ~3584
memory footprint with sigfig 2 and records between 0 and 100000: ~4096
memory footprint with sigfig 2 and records between 0 and 200000: ~4608

memory footprint with sigfig 3 and no records: ~8192
memory footprint with sigfig 3 and records between 0 and 1000: ~8192
memory footprint with sigfig 3 and records between 0 and 5000: ~16384
memory footprint with sigfig 3 and records between 0 and 100000: ~20480
memory footprint with sigfig 3 and records between 0 and 200000: ~24576

We should see an 8~16kb overhead per namespace in most cases.
I think that is acceptable, but let me know if it isn't.

* Add expires_at to control when QueriesStats should be reset

* Add query count and elapsed sum to query stats response

Also group all latency aggregations together under the elapsed key

* Remove option sprawl on QueriesStats fields

By making the whole struct optional where it is used

* Add created_at to stats queries responde object

* Set queries stats to none when stats is created

This will make the API response the same when no queries
have been recorded on the queries stats struct. This can
happen right after:
- the stats struct initialization
- the stats queries object expiration

Example of stats API response when no queries have been
recorded:

{
  ...
  "queries": null
}

* Use if else instead of early return from stats to response
@athoscouto athoscouto merged commit 5a65c94 into athos/stats Mar 1, 2024
10 checks passed
@athoscouto athoscouto deleted the athos/percentiles branch March 1, 2024 21:09
athoscouto added a commit that referenced this pull request Mar 1, 2024
* Record query latency percentiles with hdrhistogram

* Expose queries percentiles through stats response

* Simplify QueriesStats transformation to QueriesStatsResponse

* Reset query stats at the beginning of every hour (#1118)

* Add expires_at to control when QueriesStats should be reset

* Add query count and elapsed sum to query stats response

Also group all latency aggregations together under the elapsed key

* Remove option sprawl on QueriesStats fields

By making the whole struct optional where it is used

* Add created_at to stats queries responde object

* Set queries stats to none when stats is created

This will make the API response the same when no queries
have been recorded on the queries stats struct. This can
happen right after:
- the stats struct initialization
- the stats queries object expiration

Example of stats API response when no queries have been
recorded:

{
  ...
  "queries": null
}

* Use if else instead of early return from stats to response
github-merge-queue bot pushed a commit that referenced this pull request Mar 1, 2024
* Aggregated stats for queries with most elapsed_time

* Add query percentile support to stats (#1112)

* Record query latency percentiles with hdrhistogram

* Expose queries percentiles through stats response

* Simplify QueriesStats transformation to QueriesStatsResponse

* Reset query stats at the beginning of every hour (#1118)

* Add expires_at to control when QueriesStats should be reset

* Add query count and elapsed sum to query stats response

Also group all latency aggregations together under the elapsed key

* Remove option sprawl on QueriesStats fields

By making the whole struct optional where it is used

* Add created_at to stats queries responde object

* Set queries stats to none when stats is created

This will make the API response the same when no queries
have been recorded on the queries stats struct. This can
happen right after:
- the stats struct initialization
- the stats queries object expiration

Example of stats API response when no queries have been
recorded:

{
  ...
  "queries": null
}

* Use if else instead of early return from stats to response
athoscouto added a commit that referenced this pull request Mar 4, 2024
athoscouto added a commit that referenced this pull request Mar 4, 2024
athoscouto added a commit that referenced this pull request Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants