-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: test_download_remote_layers_api again #5322
Conversation
2484 tests run: 2364 passed, 0 failed, 120 skipped (full report)Flaky tests (4)Postgres 16
Postgres 14
Code coverage (full report)
The comment gets automatically updated with the latest test results
e249c8a at 2023-09-16T15:25:54.916Z :recycle: |
Regress test re-run attempts:
Post-revert:
|
This reverts commit 6e132d7.
@@ -357,8 +357,23 @@ def get_resident_physical_size(): | |||
tenant_id, timeline_id, "pageserver_resident_physical_size" | |||
) | |||
|
|||
# Shut down safekeepers before starting the pageserver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be it is better to introduce PS parameter for delaying logical size computation?
Itmay be useful not only for tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Problem here is not the fact that safekeeper communication ends up asking logical size, that's what the failpoint is for, set as environment variable when starting up pageserver.
Problem was that new WAL was being received in apparently when there's other load on the test runners leading to flakyness of the test, and my previous "fix" made this more visible. Or alternatively compaction was going on, I didn't really dig too deep because I was doing work.
I should look at the comments because as noted in slack, at best they are distracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned the test up in e249c8a not to allude to safekeepers launching logical size calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the example I linked here #3831 (comment):
EDIT: Log analysis later on: logs clearly show inmemory flush after metrics were read during shutdown
One flaky test per re-run.. |
The test is still flaky, perhaps more after #5233, see #3831.
Do one more
timeline_checkpoint
after shutting down safekeepers before shutting down pageserver. Put more effort into not compacting or creating image layers.