Sync with Prometheus up to 2023-04-14 (7309ac27) #480

zenador · 2023-04-14T09:52:17Z

This brings changes from prometheus/prometheus up to 2024-04-14 commit prometheus/prometheus@7309ac2

Major merge conflicts with rules/manager and tsdb/compact (both code and test), you might want to rewrite some of that, especially the latter.

M-map chunks replayed on startup are discarded if there was no WAL and no snapshot loaded, because there is no series created in the Head that it can map to. So only load m-map chunks from disk if there is either a snapshot loaded or there is WAL on disk. Signed-off-by: Ganesh Vernekar <[email protected]>

If the snapshot was enabled with some ooo mmap chunks on disk, and wbl was removed between restarts, then we should still be able to query the ooo mmap chunks after a restart. This test shows that we are not able to query those ooo mmap chunks after a restart under this situation. Signed-off-by: Ganesh Vernekar <[email protected]>

Without this fix, if snapshots were enabled, and wbl goes missing between restarts, then TSDB does not recognize that there are ooo mmap chunks on disk and we cannot query them until those chunks are compacted into blocks. Signed-off-by: Ganesh Vernekar <[email protected]>

It took a `Labels` where the memory could be re-used, but in practice this hardly ever benefitted. Especially after converting `relabel.Process` to `relabel.ProcessBuilder`. Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil` so the slice was reallocated multiple times by `append`. Lastly `Builder.Labels()` now estimates that the final size will depend on labels added and deleted. Signed-off-by: Bryan Boreham <[email protected]>

Larger messages cost less, because the per-message overheads at sender and receiver are spread over more samples. Example: scraping 1.5 million series every 15 seconds will send 50 messages per second. Previous default was 500. Signed-off-by: Bryan Boreham <[email protected]>

Keep the target throughput and total pending samples the same. Signed-off-by: Bryan Boreham <[email protected]>

Deleted labels are remembered, even if they were not in `base` or were removed from `add`, so `base+add-del` could go negative. Signed-off-by: Bryan Boreham <[email protected]>

Bumps [github.com/Azure/go-autorest/autorest/adal](https://github.com/Azure/go-autorest) from 0.9.22 to 0.9.23. - [Release notes](https://github.com/Azure/go-autorest/releases) - [Changelog](https://github.com/Azure/go-autorest/blob/main/CHANGELOG.md) - [Commits](Azure/go-autorest@autorest/adal/v0.9.22...autorest/adal/v0.9.23) --- updated-dependencies: - dependency-name: github.com/Azure/go-autorest/autorest/adal dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

…eus (#11711) Signed-off-by: Alex Le <[email protected]> Signed-off-by: Alex Le <[email protected]>

labels: simplify call to get Labels from Builder

Update OOO min/max time properly after replaying m-map chunks

remote-write: raise default samples per send to 2,000

Signed-off-by: Soon-Ping Phang <[email protected]>

This makes it more consistent with other command like import rules. We don't have stricts rules and uniformity accross promtool unfortunately, but I think it's better to only have the http config on relevant check commands to avoid thinking Prometheus can e.g. check the config over the wire. Signed-off-by: Julien Pivotto <[email protected]>

…b.com/Azure/go-autorest/autorest/adal-0.9.23 build(deps): bump github.com/Azure/go-autorest/autorest/adal from 0.9.22 to 0.9.23

Signed-off-by: Justin Lei <[email protected]>

Co-authored-by: Björn Rabenstein <[email protected]> Signed-off-by: Justin Lei <[email protected]>

Signed-off-by: Arve Knudsen <[email protected]>

Add support for native histograms to concreteSeriesIterator

tsdb: Improve a couple of histogram documentation comments

Signed-off-by: Alex Le <[email protected]>

This is a method used by some downstream projects; it was created to optimize the implementation in `labels_string.go` but we should have one for both implementations so the same code works with either. Signed-off-by: Bryan Boreham <[email protected]>

labels: add ScratchBuilder.Overwrite for slice implementation

In other words: Instead of having a “polymorphous” `Point` that can either contain a float value or a histogram value, use an `FPoint` for floats and an `HPoint` for histograms. This seemingly small change has a _lot_ of repercussions throughout the codebase. The idea here is to avoid the increase in size of `Point` arrays that happened after native histograms had been added. The higher-level data structures (`Sample`, `Series`, etc.) are still “polymorphous”. The same idea could be applied to them, but at each step the trade-offs needed to be evaluated. The idea with this change is to do the minimum necessary to get back to pre-histogram performance for functions that do not touch histograms. Here are comparisons for the `changes` function. The test data doesn't include histograms yet. Ideally, there would be no change in the benchmark result at all. First runtime v2.39 compared to directly prior to this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 542µs ± 1% +38.58% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 617µs ± 2% +36.48% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.36ms ± 2% +21.58% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 8.94ms ± 1% +14.21% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.30ms ± 1% +10.67% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.10ms ± 1% +11.82% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 11.8ms ± 1% +12.50% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 87.4ms ± 1% +12.63% (p=0.000 n=9+9) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 32.8ms ± 1% +8.01% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.6ms ± 2% +9.64% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 117ms ± 1% +11.69% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 876ms ± 1% +11.83% (p=0.000 n=9+10) ``` And then runtime v2.39 compared to after this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 547µs ± 1% +39.84% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 616µs ± 2% +36.15% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.26ms ± 1% +12.20% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 7.95ms ± 1% +1.59% (p=0.000 n=10+8) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.38ms ± 2% +13.49% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.02ms ± 1% +9.80% (p=0.000 n=10+9) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 10.8ms ± 1% +3.08% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 78.1ms ± 1% +0.58% (p=0.035 n=9+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 33.5ms ± 4% +10.18% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.0ms ± 1% +7.98% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 107ms ± 1% +1.92% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 775ms ± 1% -1.02% (p=0.019 n=9+9) ``` In summary, the runtime doesn't really improve with this change for queries with just a few steps. For queries with many steps, this commit essentially reinstates the old performance. This is good because the many-step queries are the one that matter most (longest absolute runtime). In terms of allocations, though, this commit doesn't make a dent at all (numbers not shown). The reason is that most of the allocations happen in the sampleRingIterator (in the storage package), which has to be addressed in a separate commit. Signed-off-by: beorn7 <[email protected]>

Previously, we had one “polymorphous” `sample` type in the `storage` package. This commit breaks it up into `fSample`, `hSample`, and `fhSample`, each still implementing the `tsdbutil.Sample` interface. This reduces allocations in `sampleRing.Add` but inflicts the penalty of the interface wrapper, which makes things worse in total. This commit therefore just demonstrates the step taken. The next commit will tackle the interface overhead problem. Signed-off-by: beorn7 <[email protected]>

This utilizes the fact that most sampleRings will only contain samples of one type. In this case, the generic interface is circumvented, and a bespoke buffer for the one actually occurring sample type is used. Should a sampleRing receive a sample of a different kind later, it will transparently switch to the generic behavior. Signed-off-by: beorn7 <[email protected]>

In the past, every sample value was a float, so it was fine to call a variable holding such a float "value" or "sample". With native histograms, a sample might have a histogram value. And a histogram value is still a value. Calling a float value just "value" or "sample" or "V" is therefore misleading. Over the last few commits, I already renamed many variables, but this cleans up a few more places where the changes are more invasive. Note that we do not to attempt naming in the JSON APIs or in the protobufs. That would be quite a disruption. However, internally, we can call variables as we want, and we should go with the option of avoiding misunderstandings. Signed-off-by: beorn7 <[email protected]>

Signed-off-by: beorn7 <[email protected]>

This commit is doing what I would have expected that Go generics do for me. However, the PromQL benchmarks show a significant runtime and allocation increase with `genericAdd`, so this replaces it with hand-coded non-generic versions of it. Signed-off-by: beorn7 <[email protected]>

histograms: Optimize query performance

Rename PopulateBlockFunc to BlockPopulator

CLAassistant · 2023-04-14T09:52:27Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
8 out of 11 committers have signed the CLA.

✅ beorn7
✅ codesome
✅ bboreham
✅ aknuds1
✅ leizor
✅ zenador
✅ csmarchbanks
✅ roidelapluie
❌ alexqyle
❌ soonping-amzn
❌ bwplotka
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

tsdb/compact.go

krajorama

LGTM, I've taken a closer look at rules directory and tsdb/compact.go as per author request.

Also checked the diff of diffs between prometheus and mimir-prometheus.

For sure needs another set of eyes.

krajorama · 2023-04-17T09:47:10Z

tsdb/compact.go

-// populateBlock fills the index and chunk writers of output blocks with new data gathered as the union
-// of the provided blocks.
+type BlockPopulator interface {
+	PopulateBlock(ctx context.Context, metrics *CompactorMetrics, logger log.Logger, chunkPool chunkenc.Pool, mergeFunc storage.VerticalChunkSeriesMergeFunc, concurrencyOpts LeveledCompactorConcurrencyOptions, blocks []BlockReader, minT, maxT int64, outBlocks []shardedBlock) error


Note: this is not the signature from upstream, but it's unclear right now how to reuse upstream signature directly. I'd leave a more sophisticated solution to a new PR and live with the current straightforward conflict resolution.

Original: https://github.com/prometheus/prometheus/blob/7309ac272195cb856b879306d6a27af7641d3346/tsdb/compact.go#L667

PopulateBlock( ctx context.Context, metrics *CompactorMetrics, logger log.Logger, chunkPool chunkenc.Pool, mergeFunc storage.VerticalChunkSeriesMergeFunc, blocks []BlockReader, meta *BlockMeta, indexw IndexWriter, chunkw ChunkWriter ) error

@pstibrany since you wrote #13 you'd probably know how to best fix the conflicts. Would it make sense for us to request a signature change from upstream like in prometheus/prometheus#11711 ?

Once we change the signature of this interface, it's not compatible with any implementation provided by upstream anymore, so I'm not sure if there's any benefit of having this interface at all.

I tried to think about this like week, and I was wondering if we could refactor the caller stack leaving this interface as is, and calling it multiple times per each outBlock we have?

We reach this code through CompactWithBlockPopulator->write, so maybe we could leave that path as it is in upstream, and how our own implementation for CompactWithSplittingAndPopulator that would writeWithSplitting and call the populator multiple times?

My main reason to further think about this is to keep at least the public API of the Compactor & BlockPopulator compatible with upstream, to reduce issues in case we want to import some other third party code that also depends on Prometheus.

@pstibrany since you wrote #13 you'd probably know how to best fix the conflicts. Would it make sense for us to request a signature change from upstream like in prometheus/prometheus#11711 ?

When I discussed possibility to upstream #13 with @codesome, he was very skeptical about that, since Prometheus wouldn't use that.

call the populator multiple times?

That's certainly an option, but requires multiple traversals of the input blocks :(

Perhaps we can have our own extension as MultiBlockPopulator, and let compactor work with both BlockPopulator (from upstream) and MultiBlockPopulator.

Yep maybe not upstream the whole thing, but just enough so that we wouldn't need to modify the signature in our fork.

Add issue for tracking: #484 also added into Native Histograms EPIC so it's not lost

That's certainly an option, but requires multiple traversals of the input blocks

I was thinking about implementing the BlockPopulator interface with a closure created in writeWithSpliting, which would do the block traversing only once.

colega · 2023-04-18T06:57:02Z

This LGTM, I think we should spend some time thinking about how to tackle the compactor changes, I feel this is more a refactor task than a merge-upstream task. Anyway I'll leave any decision here to @pstibrany as he has more context about the compactor change.

pstibrany · 2023-04-18T07:45:10Z

I am fine merging it in the current form, but agree that we should fix this soon and use upstream version of BlockPopulator.

One alternative could be to fork our sharding compactor code (reusing as much as possible), and use it from Mimir, while we only keep upstream Prometheus compactor in mimir-prometheus.

codesome · 2023-04-26T20:54:26Z

@pstibrany if you have any interface change/update/extension suggestions for upstream, please let us know.

codesome and others added 30 commits March 13, 2023 13:13

remote-write: adjust MaxShards and Capacity for larger MaxSamplesPerSend

889ef99

Keep the target throughput and total pending samples the same. Signed-off-by: Bryan Boreham <[email protected]>

labels: make sure estimated size is not negative

e917202

Deleted labels are remembered, even if they were not in `base` or were removed from `add`, so `base+add-del` could go negative. Signed-off-by: Bryan Boreham <[email protected]>

Allow populate block logic in compact to be overriden outside Prometh…

1936868

…eus (#11711) Signed-off-by: Alex Le <[email protected]> Signed-off-by: Alex Le <[email protected]>

Merge pull request #12173 from bboreham/builder-no-empty-labels

5588cab

labels: simplify call to get Labels from Builder

Merge pull request #12127 from codesome/ooo-mmap-replay

e709b0b

Update OOO min/max time properly after replaying m-map chunks

Merge pull request #12203 from bboreham/raise-max-samples-per-send

044b3ec

remote-write: raise default samples per send to 2,000

Generalized rule group iteration evaluation hook (#11885)

6cecb87

Signed-off-by: Soon-Ping Phang <[email protected]>

Merge pull request #12216 from prometheus/dependabot/go_modules/githu…

8dba916

…b.com/Azure/go-autorest/autorest/adal-0.9.23 build(deps): bump github.com/Azure/go-autorest/autorest/adal from 0.9.22 to 0.9.23

Add support for native histograms to concreteSeriesIterator

83f4398

Signed-off-by: Justin Lei <[email protected]>

Update storage/remote/codec.go

f90013a

Co-authored-by: Björn Rabenstein <[email protected]> Signed-off-by: Justin Lei <[email protected]>

tsdb: Improve a couple of histogram documentation comments

cca7178

Signed-off-by: Arve Knudsen <[email protected]>

Merge pull request #12192 from leizor/leizor/prometheus/issues/11204

6e0a469

Add support for native histograms to concreteSeriesIterator

Merge pull request #12234 from aknuds1/chore/improve-histogram-comments

8ed90b5

tsdb: Improve a couple of histogram documentation comments

Rename PopulateBlockFunc to BlockPopulator

01d0dda

Signed-off-by: Alex Le <[email protected]>

Merge pull request #12259 from bboreham/labels-overwrite

136b488

labels: add ScratchBuilder.Overwrite for slice implementation

promql: Do not return nil slices to the pool

551de03

Signed-off-by: beorn7 <[email protected]>

Merge pull request #11687 from prometheus/beorn7/histogram

6b8573a

histograms: Optimize query performance

codesome and others added 3 commits April 14, 2023 13:35

Merge pull request #12257 from alexqyle/block-populator-rename

7309ac2

Rename PopulateBlockFunc to BlockPopulator

Merge remote-tracking branch 'upstream/main'

1570114

Merge remote-tracking branch 'upstream/main'

a537d6c

Fix bugs from merge

894f657

krajorama self-requested a review April 17, 2023 07:33

krajorama reviewed Apr 17, 2023

View reviewed changes

tsdb/compact.go Outdated Show resolved Hide resolved

krajorama approved these changes Apr 17, 2023

View reviewed changes

krajorama reviewed Apr 17, 2023

View reviewed changes

krajorama requested a review from colega April 17, 2023 10:14

zenador mentioned this pull request Apr 18, 2023

Update mimir-prometheus to d162bb51b6ef grafana/mimir#4759

Merged

3 tasks

zenador added 2 commits April 18, 2023 10:08

Fix comment according to code review

c22b4ac

Merge branch 'main' into sync-upstream-14-apr-2023

8bf0d6f

krajorama merged commit 21131bf into main Apr 18, 2023

krajorama deleted the sync-upstream-14-apr-2023 branch April 18, 2023 08:21

krajorama mentioned this pull request Apr 18, 2023

Resolve conflict with upstream PopulateBlock interface #484

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with Prometheus up to 2023-04-14 (7309ac27) #480

Sync with Prometheus up to 2023-04-14 (7309ac27) #480

zenador commented Apr 14, 2023 •

edited

Loading

CLAassistant commented Apr 14, 2023 •

edited

Loading

krajorama left a comment

krajorama Apr 17, 2023 •

edited

Loading

zenador Apr 18, 2023

colega Apr 18, 2023

pstibrany Apr 18, 2023

zenador Apr 18, 2023

krajorama Apr 18, 2023

colega Apr 18, 2023

colega commented Apr 18, 2023

pstibrany commented Apr 18, 2023

codesome commented Apr 26, 2023

Sync with Prometheus up to 2023-04-14 (7309ac27) #480

Sync with Prometheus up to 2023-04-14 (7309ac27) #480

Conversation

zenador commented Apr 14, 2023 • edited Loading

CLAassistant commented Apr 14, 2023 • edited Loading

krajorama left a comment

Choose a reason for hiding this comment

krajorama Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

zenador Apr 18, 2023

Choose a reason for hiding this comment

colega Apr 18, 2023

Choose a reason for hiding this comment

pstibrany Apr 18, 2023

Choose a reason for hiding this comment

zenador Apr 18, 2023

Choose a reason for hiding this comment

krajorama Apr 18, 2023

Choose a reason for hiding this comment

colega Apr 18, 2023

Choose a reason for hiding this comment

colega commented Apr 18, 2023

pstibrany commented Apr 18, 2023

codesome commented Apr 26, 2023

zenador commented Apr 14, 2023 •

edited

Loading

CLAassistant commented Apr 14, 2023 •

edited

Loading

krajorama Apr 17, 2023 •

edited

Loading