Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: test_download_remote_layers_api again #5322

Merged
merged 7 commits into from
Sep 16, 2023

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Sep 15, 2023

The test is still flaky, perhaps more after #5233, see #3831.

Do one more timeline_checkpoint after shutting down safekeepers before shutting down pageserver. Put more effort into not compacting or creating image layers.

@github-actions
Copy link

github-actions bot commented Sep 15, 2023

2484 tests run: 2364 passed, 0 failed, 120 skipped (full report)


Flaky tests (4)

Postgres 16

  • test_partial_evict_tenant: debug
  • test_delete_tenant_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-mock_s3-tenant-delete-before-remove-timelines-dir-True]: debug
  • test_wal_lagging: release

Postgres 14

  • test_get_tenant_size_with_multiple_branches: debug

Code coverage (full report)

  • functions: 53.0% (7751 of 14613 functions)
  • lines: 81.1% (45263 of 55836 lines)

The comment gets automatically updated with the latest test results
e249c8a at 2023-09-16T15:25:54.916Z :recycle:

@koivunej
Copy link
Member Author

koivunej commented Sep 15, 2023

Regress test re-run attempts:

  1. test_partial_evict_tenant[debug-pg16]
  2. only error from 6e132d7, perhaps this worked?
  3. same, with partial_evict_tenant 😱
  4. same, with test_get_tenant_size_with_multiple_branches which is flaky as debug (?)
  5. same as 3rd

Post-revert:

  • pg16 other known flaky
  • pg16 other known flaky
  • pg16 other known flaky

@koivunej koivunej marked this pull request as ready for review September 16, 2023 10:30
@koivunej koivunej enabled auto-merge (squash) September 16, 2023 10:30
@@ -357,8 +357,23 @@ def get_resident_physical_size():
tenant_id, timeline_id, "pageserver_resident_physical_size"
)

# Shut down safekeepers before starting the pageserver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it is better to introduce PS parameter for delaying logical size computation?
Itmay be useful not only for tests...

Copy link
Member Author

@koivunej koivunej Sep 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Problem here is not the fact that safekeeper communication ends up asking logical size, that's what the failpoint is for, set as environment variable when starting up pageserver.

Problem was that new WAL was being received in apparently when there's other load on the test runners leading to flakyness of the test, and my previous "fix" made this more visible. Or alternatively compaction was going on, I didn't really dig too deep because I was doing work.

I should look at the comments because as noted in slack, at best they are distracting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned the test up in e249c8a not to allude to safekeepers launching logical size calculation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the example I linked here #3831 (comment):

EDIT: Log analysis later on: logs clearly show inmemory flush after metrics were read during shutdown

@koivunej
Copy link
Member Author

One flaky test per re-run..

@koivunej koivunej disabled auto-merge September 16, 2023 15:12
@koivunej koivunej merged commit a221ecb into main Sep 16, 2023
@koivunej koivunej deleted the test_download_remote_layers_api_again branch September 16, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants