-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with Prometheus up to 2023-04-14 (7309ac27) #480
Conversation
M-map chunks replayed on startup are discarded if there was no WAL and no snapshot loaded, because there is no series created in the Head that it can map to. So only load m-map chunks from disk if there is either a snapshot loaded or there is WAL on disk. Signed-off-by: Ganesh Vernekar <[email protected]>
If the snapshot was enabled with some ooo mmap chunks on disk, and wbl was removed between restarts, then we should still be able to query the ooo mmap chunks after a restart. This test shows that we are not able to query those ooo mmap chunks after a restart under this situation. Signed-off-by: Ganesh Vernekar <[email protected]>
Without this fix, if snapshots were enabled, and wbl goes missing between restarts, then TSDB does not recognize that there are ooo mmap chunks on disk and we cannot query them until those chunks are compacted into blocks. Signed-off-by: Ganesh Vernekar <[email protected]>
It took a `Labels` where the memory could be re-used, but in practice this hardly ever benefitted. Especially after converting `relabel.Process` to `relabel.ProcessBuilder`. Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil` so the slice was reallocated multiple times by `append`. Lastly `Builder.Labels()` now estimates that the final size will depend on labels added and deleted. Signed-off-by: Bryan Boreham <[email protected]>
Larger messages cost less, because the per-message overheads at sender and receiver are spread over more samples. Example: scraping 1.5 million series every 15 seconds will send 50 messages per second. Previous default was 500. Signed-off-by: Bryan Boreham <[email protected]>
Keep the target throughput and total pending samples the same. Signed-off-by: Bryan Boreham <[email protected]>
Deleted labels are remembered, even if they were not in `base` or were removed from `add`, so `base+add-del` could go negative. Signed-off-by: Bryan Boreham <[email protected]>
Bumps [github.com/Azure/go-autorest/autorest/adal](https://github.com/Azure/go-autorest) from 0.9.22 to 0.9.23. - [Release notes](https://github.com/Azure/go-autorest/releases) - [Changelog](https://github.com/Azure/go-autorest/blob/main/CHANGELOG.md) - [Commits](Azure/go-autorest@autorest/adal/v0.9.22...autorest/adal/v0.9.23) --- updated-dependencies: - dependency-name: github.com/Azure/go-autorest/autorest/adal dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>
…eus (#11711) Signed-off-by: Alex Le <[email protected]> Signed-off-by: Alex Le <[email protected]>
labels: simplify call to get Labels from Builder
Update OOO min/max time properly after replaying m-map chunks
remote-write: raise default samples per send to 2,000
Signed-off-by: Soon-Ping Phang <[email protected]>
This makes it more consistent with other command like import rules. We don't have stricts rules and uniformity accross promtool unfortunately, but I think it's better to only have the http config on relevant check commands to avoid thinking Prometheus can e.g. check the config over the wire. Signed-off-by: Julien Pivotto <[email protected]>
…b.com/Azure/go-autorest/autorest/adal-0.9.23 build(deps): bump github.com/Azure/go-autorest/autorest/adal from 0.9.22 to 0.9.23
Signed-off-by: Justin Lei <[email protected]>
Co-authored-by: Björn Rabenstein <[email protected]> Signed-off-by: Justin Lei <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Add support for native histograms to concreteSeriesIterator
tsdb: Improve a couple of histogram documentation comments
Signed-off-by: Alex Le <[email protected]>
This is a method used by some downstream projects; it was created to optimize the implementation in `labels_string.go` but we should have one for both implementations so the same code works with either. Signed-off-by: Bryan Boreham <[email protected]>
labels: add ScratchBuilder.Overwrite for slice implementation
In other words: Instead of having a “polymorphous” `Point` that can either contain a float value or a histogram value, use an `FPoint` for floats and an `HPoint` for histograms. This seemingly small change has a _lot_ of repercussions throughout the codebase. The idea here is to avoid the increase in size of `Point` arrays that happened after native histograms had been added. The higher-level data structures (`Sample`, `Series`, etc.) are still “polymorphous”. The same idea could be applied to them, but at each step the trade-offs needed to be evaluated. The idea with this change is to do the minimum necessary to get back to pre-histogram performance for functions that do not touch histograms. Here are comparisons for the `changes` function. The test data doesn't include histograms yet. Ideally, there would be no change in the benchmark result at all. First runtime v2.39 compared to directly prior to this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 542µs ± 1% +38.58% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 617µs ± 2% +36.48% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.36ms ± 2% +21.58% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 8.94ms ± 1% +14.21% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.30ms ± 1% +10.67% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.10ms ± 1% +11.82% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 11.8ms ± 1% +12.50% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 87.4ms ± 1% +12.63% (p=0.000 n=9+9) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 32.8ms ± 1% +8.01% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.6ms ± 2% +9.64% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 117ms ± 1% +11.69% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 876ms ± 1% +11.83% (p=0.000 n=9+10) ``` And then runtime v2.39 compared to after this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 547µs ± 1% +39.84% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 616µs ± 2% +36.15% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.26ms ± 1% +12.20% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 7.95ms ± 1% +1.59% (p=0.000 n=10+8) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.38ms ± 2% +13.49% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.02ms ± 1% +9.80% (p=0.000 n=10+9) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 10.8ms ± 1% +3.08% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 78.1ms ± 1% +0.58% (p=0.035 n=9+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 33.5ms ± 4% +10.18% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.0ms ± 1% +7.98% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 107ms ± 1% +1.92% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 775ms ± 1% -1.02% (p=0.019 n=9+9) ``` In summary, the runtime doesn't really improve with this change for queries with just a few steps. For queries with many steps, this commit essentially reinstates the old performance. This is good because the many-step queries are the one that matter most (longest absolute runtime). In terms of allocations, though, this commit doesn't make a dent at all (numbers not shown). The reason is that most of the allocations happen in the sampleRingIterator (in the storage package), which has to be addressed in a separate commit. Signed-off-by: beorn7 <[email protected]>
Previously, we had one “polymorphous” `sample` type in the `storage` package. This commit breaks it up into `fSample`, `hSample`, and `fhSample`, each still implementing the `tsdbutil.Sample` interface. This reduces allocations in `sampleRing.Add` but inflicts the penalty of the interface wrapper, which makes things worse in total. This commit therefore just demonstrates the step taken. The next commit will tackle the interface overhead problem. Signed-off-by: beorn7 <[email protected]>
This utilizes the fact that most sampleRings will only contain samples of one type. In this case, the generic interface is circumvented, and a bespoke buffer for the one actually occurring sample type is used. Should a sampleRing receive a sample of a different kind later, it will transparently switch to the generic behavior. Signed-off-by: beorn7 <[email protected]>
In the past, every sample value was a float, so it was fine to call a variable holding such a float "value" or "sample". With native histograms, a sample might have a histogram value. And a histogram value is still a value. Calling a float value just "value" or "sample" or "V" is therefore misleading. Over the last few commits, I already renamed many variables, but this cleans up a few more places where the changes are more invasive. Note that we do not to attempt naming in the JSON APIs or in the protobufs. That would be quite a disruption. However, internally, we can call variables as we want, and we should go with the option of avoiding misunderstandings. Signed-off-by: beorn7 <[email protected]>
Signed-off-by: beorn7 <[email protected]>
This commit is doing what I would have expected that Go generics do for me. However, the PromQL benchmarks show a significant runtime and allocation increase with `genericAdd`, so this replaces it with hand-coded non-generic versions of it. Signed-off-by: beorn7 <[email protected]>
histograms: Optimize query performance
Rename PopulateBlockFunc to BlockPopulator
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I've taken a closer look at rules directory and tsdb/compact.go as per author request.
Also checked the diff of diffs between prometheus and mimir-prometheus.
For sure needs another set of eyes.
// populateBlock fills the index and chunk writers of output blocks with new data gathered as the union | ||
// of the provided blocks. | ||
type BlockPopulator interface { | ||
PopulateBlock(ctx context.Context, metrics *CompactorMetrics, logger log.Logger, chunkPool chunkenc.Pool, mergeFunc storage.VerticalChunkSeriesMergeFunc, concurrencyOpts LeveledCompactorConcurrencyOptions, blocks []BlockReader, minT, maxT int64, outBlocks []shardedBlock) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is not the signature from upstream, but it's unclear right now how to reuse upstream signature directly. I'd leave a more sophisticated solution to a new PR and live with the current straightforward conflict resolution.
PopulateBlock(
ctx context.Context,
metrics *CompactorMetrics,
logger log.Logger,
chunkPool chunkenc.Pool,
mergeFunc storage.VerticalChunkSeriesMergeFunc,
blocks []BlockReader,
meta *BlockMeta,
indexw IndexWriter,
chunkw ChunkWriter
) error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany since you wrote #13 you'd probably know how to best fix the conflicts. Would it make sense for us to request a signature change from upstream like in prometheus/prometheus#11711 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we change the signature of this interface, it's not compatible with any implementation provided by upstream anymore, so I'm not sure if there's any benefit of having this interface at all.
I tried to think about this like week, and I was wondering if we could refactor the caller stack leaving this interface as is, and calling it multiple times per each outBlock we have?
We reach this code through CompactWithBlockPopulator->write
, so maybe we could leave that path as it is in upstream, and how our own implementation for CompactWithSplittingAndPopulator
that would writeWithSplitting
and call the populator multiple times?
My main reason to further think about this is to keep at least the public API of the Compactor & BlockPopulator compatible with upstream, to reduce issues in case we want to import some other third party code that also depends on Prometheus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany since you wrote #13 you'd probably know how to best fix the conflicts. Would it make sense for us to request a signature change from upstream like in prometheus/prometheus#11711 ?
When I discussed possibility to upstream #13 with @codesome, he was very skeptical about that, since Prometheus wouldn't use that.
call the populator multiple times?
That's certainly an option, but requires multiple traversals of the input blocks :(
Perhaps we can have our own extension as MultiBlockPopulator
, and let compactor work with both BlockPopulator
(from upstream) and MultiBlockPopulator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep maybe not upstream the whole thing, but just enough so that we wouldn't need to modify the signature in our fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add issue for tracking: #484 also added into Native Histograms EPIC so it's not lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's certainly an option, but requires multiple traversals of the input blocks
I was thinking about implementing the BlockPopulator
interface with a closure created in writeWithSpliting
, which would do the block traversing only once.
This LGTM, I think we should spend some time thinking about how to tackle the compactor changes, I feel this is more a refactor task than a merge-upstream task. Anyway I'll leave any decision here to @pstibrany as he has more context about the compactor change. |
I am fine merging it in the current form, but agree that we should fix this soon and use upstream version of One alternative could be to fork our sharding compactor code (reusing as much as possible), and use it from Mimir, while we only keep upstream Prometheus compactor in |
@pstibrany if you have any interface change/update/extension suggestions for upstream, please let us know. |
This brings changes from prometheus/prometheus up to 2024-04-14 commit prometheus/prometheus@7309ac2
Major merge conflicts with rules/manager and tsdb/compact (both code and test), you might want to rewrite some of that, especially the latter.