Bug: Slow compactor cleanups can cause consistency check errors on queriers #9924

jhalterman · 2024-11-15T19:47:23Z

What is the bug?

Queriers currently provide a 45m grace period (3 * 15m bucket-store.sync-inverval) by default for store gateways to discover blocks before they can be queried. But the grace period is based on when the block was uploaded, not when it became discoverable via a bucket index. When bucket indexes don't get updated quickly, this can cause a block to be queryable before store gateways have had an opportunity to discover it. In a worst case scenario where a bucket index isn't updated for 45m, any new blocks become immediately queryable once the bucket index is loaded by a querier.

Proposed fix

Queriers should base their grace period, in whole or part, on when the bucket index was created, since that's when a block is discoverable by store gateways. For example, we could use 2 * the bucket-store.sync-inverval as the grace period relative to the bucket index creation time, but we could also (optionally) require that the existing 3 * the bucket-store.sync-inverval grace period relative to block upload time be satisfied, which would give a bit more time for store gateways to discover blocks in some cases.

How to reproduce it?

This came up when heavy OOO block creation caused a lot of compaction activity which slowed down bucket cleanup runs, leading to delayed bucket index creations.

What did you think would happen?

err-mimir-store-consistency-check-failed should not occur.

What was your environment?

K8s

Any additional context to share?

No response

The text was updated successfully, but these errors were encountered:

charleskorn · 2024-11-17T21:44:37Z

A couple of thoughts on this:

How would a querier know when a block first appeared in the bucket index? Would it need to examine all historical bucket index versions to discover this? Or would compactors set the time in the bucket index?
Should we do the same thing for deletion times for blocks in the bucket index? Otherwise queriers could stop querying a compacted block (based on its deletion time, not when the deletion first appeared in the bucket index) well before the replacement block is required to be queried (based on when it first appeared in the bucket index, which may be well after the original block was deleted) - see Fix issue where both recently compacted blocks and their source blocks can be skipped during querying #9224 as an example of what I mean here

jhalterman · 2024-11-19T21:59:01Z

How would a querier know when a block first appeared in the bucket index? Would it need to examine all historical bucket index versions to discover this? Or would compactors set the time in the bucket index?

Having to read multiple versions of a bucket index doesn't sound good. Compactors including an approximate time that each block was indexed in the index itself would lighten the burden for queriers. This could probably even (eventually) replace the existing updated_at field that's in the index, which might not be needed anymore.

Another alternative is the block metas could be updated to include the time at which each block was first indexed, but that's potentially a lot of writes.

Should we do the same thing for deletion times for blocks in the bucket index?

Yes, possibly. Do you think that's worthy of a separate issue?

charleskorn · 2024-11-22T02:20:33Z

Having to read multiple versions of a bucket index doesn't sound good. Compactors including an approximate time that each block was indexed in the index itself would lighten the burden for queriers. This could probably even (eventually) replace the existing updated_at field that's in the index, which might not be needed anymore.

Sounds good to me. I'd be in favour of replacing the updated_at field in the index rather than introducing a new field, unless we really need both - I worry that otherwise different components would use different fields and that will lead to other problems.

Should we do the same thing for deletion times for blocks in the bucket index?

Yes, possibly. Do you think that's worthy of a separate issue?

I think we have to do both at the same time - otherwise we might end up in a situation like #9224. Imagine a block A is compacted into new block N at T. If the compactor doesn't update the bucket index with the new block N for an hour, then the deletion time for block A will be T, but the updated_at for N will be T+1h, and queriers would ignore block N (because it is too new) while also not querying A (because it was deleted too long ago).

jhalterman added bug Something isn't working component/querier squad/query labels Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Slow compactor cleanups can cause consistency check errors on queriers #9924

Bug: Slow compactor cleanups can cause consistency check errors on queriers #9924

jhalterman commented Nov 15, 2024 •

edited

Loading

charleskorn commented Nov 17, 2024

jhalterman commented Nov 19, 2024

charleskorn commented Nov 22, 2024

Bug: Slow compactor cleanups can cause consistency check errors on queriers #9924

Bug: Slow compactor cleanups can cause consistency check errors on queriers #9924

Comments

jhalterman commented Nov 15, 2024 • edited Loading

What is the bug?

Proposed fix

How to reproduce it?

What did you think would happen?

What was your environment?

Any additional context to share?

charleskorn commented Nov 17, 2024

jhalterman commented Nov 19, 2024

charleskorn commented Nov 22, 2024

jhalterman commented Nov 15, 2024 •

edited

Loading