Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layer file creation: fatal_err on timeline dir fsync #6985

Conversation

problame
Copy link
Contributor

@problame problame commented Mar 1, 2024

As pointed out in the comments added in this PR:
the in-memory state of the filesystem already has the layer file in its final place.
If the fsync fails, but pageserver continues to execute, it's quite easy
for subsequent pageserver code to observe the file being there and
assume it's durable, when it really isn't.

It can happen that we get ENOSPC during the fsync.
However,

  1. the timeline dir is small (remember, the big layer file has already been synced).
    Small data means ENOSPC due to delayed allocation races etc are less likely.
  2. what else are we going to do in that case?

If we decide to bubble up the error, the file remains on disk.
We could try to unlink it and fsync after the unlink.
If that fails, we would definitely need to error out.
Is it worth the trouble though?

Side note: all this logic about not carrying on after fsync failure
implies that we sync the filesystem successfully before we restart
the pageserver. We don't do that right now, but should (=> #6989)

part of #6663

problame added 4 commits March 1, 2024 11:56
The `writer.finish()` methods already fsync the inode,
using `VirtualFile::sync_all()`.

All that the callers need to do is fsync their directory,
i.e., the timeline directory.

Note that there's a call in the new compaction code that
is apparently dead-at-runtime, so, I couldn't fix up
any fsyncs there [Link](https://github.com/neondatabase/neon/blob/502b69b33bbd4ad1b0647e921a9c665249a2cd62/pageserver/src/tenant/timeline/compaction.rs#L204-L211).

In the grand scheme of things, layer durability probably doesn't
matter anymore because the remote storage is authoritative at all times
as of #5198. But, let's not break the discipline in htis commit.

part of #6663
As pointed out in the comments added in this PR:
the in-memory state of the filesystem already has the layer file in its final place.
If the fsync fails, but pageserver continues to execute, it's quite easy
for subsequent pageserver code to observe the file being there and
assume it's durable, where it really isn't.

It can happen that we get ENOSPC during the fsync.
However,
1. the timeline dir is small (remember, the big layer _file_ has already been synced).
   Small data means ENOSPC due to delayed allocation races etc are less likely.
2. what elase are we going to do in that case?

If we decide to bubble up the error, the file remains on disk.
We could try to unlink it and fsync after the unlink.
If that fails, we would _definitely_ need to error out.
Is it worth the trouble though?

Side note: all this logic about not carrying on after fsync failure
implies that we `sync` the filesystem successfully before we restart
the pageserver. Our systemd unit currently does not do that, but should.
…kio-epoll-uring/layer-write-path-fsync-cleanups
…sync-cleanups' into problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync
@problame problame requested review from jcsp and koivunej March 1, 2024 12:41
@problame problame requested a review from a team as a code owner March 1, 2024 12:41
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking fine. Because we haven't yet added these layers to the upload queue and then uploaded a new version of index_part.json, we will delete them on the next restart (and not fsync while doing that).

Copy link

github-actions bot commented Mar 1, 2024

2484 tests run: 2362 passed, 0 failed, 122 skipped (full report)


Flaky tests (1)

Postgres 16

  • test_crafted_wal_end[last_wal_record_crossing_segment]: release

Code coverage* (full report)

  • functions: 28.7% (6936 of 24179 functions)
  • lines: 47.2% (42530 of 90144 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
00bf050 at 2024-03-04T12:26:25.853Z :recycle:

problame added 2 commits March 1, 2024 15:25
…kio-epoll-uring/layer-write-path-fsync-cleanups
…sync-cleanups' into problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync
Base automatically changed from problame/integrate-tokio-epoll-uring/layer-write-path-fsync-cleanups to main March 4, 2024 11:33
@problame problame enabled auto-merge (squash) March 4, 2024 11:38
@problame problame merged commit c861d71 into main Mar 4, 2024
50 checks passed
@problame problame deleted the problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync branch March 4, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants