Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
layer file creation: fatal_err on timeline dir fsync (#6985)
As pointed out in the comments added in this PR: the in-memory state of the filesystem already has the layer file in its final place. If the fsync fails, but pageserver continues to execute, it's quite easy for subsequent pageserver code to observe the file being there and assume it's durable, when it really isn't. It can happen that we get ENOSPC during the fsync. However, 1. the timeline dir is small (remember, the big layer _file_ has already been synced). Small data means ENOSPC due to delayed allocation races etc are less likely. 2. what else are we going to do in that case? If we decide to bubble up the error, the file remains on disk. We could try to unlink it and fsync after the unlink. If that fails, we would _definitely_ need to error out. Is it worth the trouble though? Side note: all this logic about not carrying on after fsync failure implies that we `sync` the filesystem successfully before we restart the pageserver. We don't do that right now, but should (=> #6989) part of #6663
- Loading branch information
c861d71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2561 tests run: 2426 passed, 1 failed, 134 skipped (full report)
Failures on Postgres 14
test_basebackup_with_high_slru_count[github-actions-selfhosted-vectored-10-13-30]
: releaseFlaky tests (2)
Postgres 16
test_remote_storage_upload_queue_retries
: debugtest_vm_bit_clear_on_heap_lock
: debugCode coverage* (full report)
functions
:28.7% (6937 of 24179 functions)
lines
:47.2% (42536 of 90144 lines)
* collected from Rust tests only
c861d71 at 2024-03-04T13:25:22.254Z :recycle: