Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiered compaction: test_uploads_and_deletions self.lsn_range.start > lsn #7759

Open
Tracked by #7554
arpad-m opened this issue May 14, 2024 · 2 comments
Open
Tracked by #7554
Labels
c/storage/pageserver Component: storage: pageserver

Comments

@arpad-m
Copy link
Member

arpad-m commented May 14, 2024

there is flaky occurences of an assertion error in the test_uploads_and_deletions test.

first second occurence. Excerpt:

2024-05-14 15:16:00.128 INFO [utils.py:486] not allowed pageserver_1 error: 2024-05-14T15:15:58.120068Z ERROR request{method=PUT path=/v1/tenant/b7776a497c9182b5c17abd0073d28ce1/timeline/8abb0f7b8378b7cb83ece8a80b9c270a/checkpoint request_id=d4d5a2d4-6ea3-40e7-b7ce-e611f1106352}:manual_checkpoint{tenant_id=b7776a497c9182b5c17abd0073d28ce1 shard_id=0000 timeline_id=8abb0f7b8378b7cb83ece8a80b9c270a}:panic{thread=mgmt request worker location=pageserver/src/tenant/storage_layer/delta_layer.rs:455:9}: assertion failed: self.lsn_range.start <= lsn

related: #7707

@arpad-m arpad-m added the c/storage/pageserver Component: storage: pageserver label May 14, 2024
@arpad-m
Copy link
Member Author

arpad-m commented May 14, 2024

This is also responsible for errors of the kind:

2024-05-14T17:04:38.123655Z ERROR request{method=PUT path=/v1/tenant/c71dfcd376a16f49b9d47eec03bbf542/timeline/6a502ede4031c3aeb12d3907ac830082/checkpoint request_id=de52539a-76f9-418e-8dd3-c6be38f31199}: HTTP request handler task panicked: task 345199 panicked

arpad-m added a commit that referenced this issue May 15, 2024
Adds a test that is a reproducer for many tiered compaction bugs,
both ones that have since been fixed as well as still unfxied ones:
* (now fixed) #7296 
* #7707 
* #7759
* Likely also #7244 but I haven't tried that.

The key ordering bug can be reproduced by switching to
`merge_delta_keys` instead of `merge_delta_keys_buffered`, so reverting
a big part of #7661, although it only sometimes reproduces (30-50% of
cases).

part of #7554
@problame
Copy link
Contributor

Meeting notes:

  • the assertion means that we write a value at an LSN outside of the LSN rectangle with which we initialize the delta layer writer
  • best guess at this time is that the LSN range is not determined correctly ahead of time
  • why are we even pre-determining the LSN range ahead of time instead of at the end?
  • arpad: whole idea of tiered compaction is to pre-determine the LSN range

Action:

  • christian to spend a day with the code & try to narrow down the issue with assertions / debugger

@problame problame self-assigned this May 15, 2024
a-masterov pushed a commit that referenced this issue May 20, 2024
Adds a test that is a reproducer for many tiered compaction bugs,
both ones that have since been fixed as well as still unfxied ones:
* (now fixed) #7296 
* #7707 
* #7759
* Likely also #7244 but I haven't tried that.

The key ordering bug can be reproduced by switching to
`merge_delta_keys` instead of `merge_delta_keys_buffered`, so reverting
a big part of #7661, although it only sometimes reproduces (30-50% of
cases).

part of #7554
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

No branches or pull requests

2 participants