Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure counter reset hints are corrected when merging batches #9909

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

fionaliao
Copy link
Contributor

@fionaliao fionaliao commented Nov 14, 2024

What this PR does

When merging samples from different iterators, histogram counter reset hints need to be recalculated. The mergeIterator currently does not do so. This PR fixes that by remembering the iterator for a sample in a batch, so when the next sample comes along, the iterators for the two samples can be compared and an unknown counter reset hint is set for the next sample if the iterators don't match. See prometheus/prometheus#15346 for related discussion.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

pkg/querier/batch/stream.go Outdated Show resolved Hide resolved
pkg/querier/batch/stream.go Outdated Show resolved Hide resolved
@fionaliao fionaliao marked this pull request as ready for review November 26, 2024 21:49
@fionaliao
Copy link
Contributor Author

Implementation and testing is done (pending CI), but I want to extract the mimir-prometheus update into a separate PR before removing the "WIP"

@fionaliao fionaliao changed the title WIP: fix counter resets when merging batches Ensure counter reset hints are corrected when merging batches Nov 27, 2024
@fionaliao
Copy link
Contributor Author

mimir-prometheus update was done as part of #10033

Comment on lines +193 to +230
name: "overlapping and interleaved samples",
chunks: func() []GenericChunk {
chk1, err := chunk.NewForEncoding(enc)
require.NoError(t, err)
addFunc(chk1, 0, 0)
addFunc(chk1, 3, 1)
addFunc(chk1, 4, 2)
addFunc(chk1, 5, 3)
addFunc(chk1, 7, 4)
addFunc(chk1, 8, 5)
chk2, err := chunk.NewForEncoding(enc)
require.NoError(t, err)
addFunc(chk2, 1, 1)
addFunc(chk2, 2, 7)
addFunc(chk2, 5, 8)
addFunc(chk2, 6, 9)
addFunc(chk2, 7, 10)
addFunc(chk2, 9, 11)
return []GenericChunk{
NewGenericChunk(0, 8, chunk.NewChunk(labels.FromStrings(model.MetricNameLabel, "foo"), chk1, 0, 8).Data.NewIterator),
NewGenericChunk(1, 9, chunk.NewChunk(labels.FromStrings(model.MetricNameLabel, "foo"), chk2, 1, 9).Data.NewIterator),
}
}(),
expectedSamples: []checkHintTestSample{
{t: 0, v: 0, hint: histogram.UnknownCounterReset}, // c1
{t: 1, v: 1, hint: histogram.UnknownCounterReset}, // c2
{t: 2, v: 7, hint: histogram.NotCounterReset}, // c2
{t: 3, v: 1, hint: histogram.UnknownCounterReset}, // c1
{t: 4, v: 2, hint: histogram.NotCounterReset}, // c1
{t: 5, v: 8, hint: histogram.UnknownCounterReset}, // c2
// Next sample is from c2. This is consecutive, but this ends up as the first sample in a merged
// batch, and the c1 sample at ts 7 and 8 is added to the batch stream before this c2 sample, so
// the prevIteratorID is set to c1 rather than c2 when we append this sample.
{t: 6, v: 9, hint: histogram.UnknownCounterReset},
{t: 7, v: 4, hint: histogram.UnknownCounterReset}, // c1
{t: 8, v: 5, hint: histogram.NotCounterReset}, // c1
{t: 9, v: 11, hint: histogram.UnknownCounterReset}, // c2
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting test case - you can have as two consecutive samples from the same iterator but the second one isn't always detected as NotCounterReset, as you can see with ts 6. This is a case where keeping track of the last timestamp per iterator rather than just the last iterator would have detected the NotCounterReset properly. However, I think the current algorithm is good enough for now, I don't want to complicate it further without checking how it works in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we aim to fail safe.

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good a few nits added first round

pkg/querier/batch/stream.go Outdated Show resolved Hide resolved
Comment on lines +193 to +230
name: "overlapping and interleaved samples",
chunks: func() []GenericChunk {
chk1, err := chunk.NewForEncoding(enc)
require.NoError(t, err)
addFunc(chk1, 0, 0)
addFunc(chk1, 3, 1)
addFunc(chk1, 4, 2)
addFunc(chk1, 5, 3)
addFunc(chk1, 7, 4)
addFunc(chk1, 8, 5)
chk2, err := chunk.NewForEncoding(enc)
require.NoError(t, err)
addFunc(chk2, 1, 1)
addFunc(chk2, 2, 7)
addFunc(chk2, 5, 8)
addFunc(chk2, 6, 9)
addFunc(chk2, 7, 10)
addFunc(chk2, 9, 11)
return []GenericChunk{
NewGenericChunk(0, 8, chunk.NewChunk(labels.FromStrings(model.MetricNameLabel, "foo"), chk1, 0, 8).Data.NewIterator),
NewGenericChunk(1, 9, chunk.NewChunk(labels.FromStrings(model.MetricNameLabel, "foo"), chk2, 1, 9).Data.NewIterator),
}
}(),
expectedSamples: []checkHintTestSample{
{t: 0, v: 0, hint: histogram.UnknownCounterReset}, // c1
{t: 1, v: 1, hint: histogram.UnknownCounterReset}, // c2
{t: 2, v: 7, hint: histogram.NotCounterReset}, // c2
{t: 3, v: 1, hint: histogram.UnknownCounterReset}, // c1
{t: 4, v: 2, hint: histogram.NotCounterReset}, // c1
{t: 5, v: 8, hint: histogram.UnknownCounterReset}, // c2
// Next sample is from c2. This is consecutive, but this ends up as the first sample in a merged
// batch, and the c1 sample at ts 7 and 8 is added to the batch stream before this c2 sample, so
// the prevIteratorID is set to c1 rather than c2 when we append this sample.
{t: 6, v: 9, hint: histogram.UnknownCounterReset},
{t: 7, v: 4, hint: histogram.UnknownCounterReset}, // c1
{t: 8, v: 5, hint: histogram.NotCounterReset}, // c1
{t: 9, v: 11, hint: histogram.UnknownCounterReset}, // c2
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we aim to fail safe.

@@ -85,6 +85,7 @@
* [BUGFIX] Fix pooling buffer reuse logic when `-distributor.max-request-pool-buffer-size` is set. #9666
* [BUGFIX] Fix issue when using the experimental `-ruler.max-independent-rule-evaluation-concurrency` feature, where the ruler could panic as it updates a running ruleset or shutdowns. #9726
* [BUGFIX] Always return unknown hint for first sample in non-gauge native histograms chunk to avoid incorrect counter reset hints when merging chunks from different sources. #10033
* [BUGFIX] Ensure counter reset hints are corrected when merging batches. #9909
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [BUGFIX] Ensure counter reset hints are corrected when merging batches. #9909
* [BUGFIX] Ensure native histograms counter reset hints are corrected when merging results from different sources. #9909

@@ -119,6 +119,8 @@ func (p *prometheusHistogramChunk) AddFloatHistogram(_ int64, _ *histogram.Float
// AddHistogram adds another histogram to the chunk. While AddHistogram works, it is only implemented to make tests
// work, and should not be used in production. In particular, it appends all histograms to single chunk, and uses new
// Appender for each invocation.
// AddHistogram adds another histogram to the chunk. While AddHistogram works, it is only implemented to make tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we meant to replace the doc string not add to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants