Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop reading local metadata file #6777

Conversation

problame
Copy link
Contributor

@problame problame commented Feb 15, 2024

Problem

The local metadata file is effectively unused if remote storage is configured, because remote storage is the authority.
We do write and read it, but, we never use its contents.

In particular, during startup, we write each metadata file with the contents fetched from remote storage during preload.
Effectively, that's a VirtualFile::crashsafe_overwrite for each timeline.

Further, it turns out that we can delete a bunch of legacy code if we stop caring about the metadata file.

Refs:

Solution

This PR removes any code that reads the file.

We continue to write (and delete it) as before.

Once we've released this version and are sure we won't roll back, we will remove the code that writes it.
That'll happen in #6769.

The price is that it's now required to run with remote storage anymore.
We agree that we want to get there eventually, so this is the first step.
see #4099

Review

Reviewers should not just check the changes in this PR (which are code deletions, mostly).
But also they should ensure that after this change, absence of the file does not change any outcomes.

This means auditing the code base for, example,

  • reads of the file
  • stat of the file
  • removal of the file where NotFound isn't treated as if it is success
  • etc

This is in preparation for the removal of _writing_ it, which will
happen in #6769
@problame problame requested review from koivunej and arpad-m February 15, 2024 16:41
@problame problame marked this pull request as ready for review February 15, 2024 16:41
@problame problame requested a review from a team as a code owner February 15, 2024 16:41
@arpad-m
Copy link
Member

arpad-m commented Feb 15, 2024

Reviewers should not just check the changes in this PR (which are code deletions, mostly).
But also they should ensure that after this change, absence of the file does not change any outcomes.

I did some research:

  • I grepped for TimelineMetadata::from_bytes and outside of tests and the code removed in this PR I only found usage in one location: the pageserver ctl tool still has a command to read the metadata file from disk (see handle_metadata function), but I think this can be a followup (but also ok with having it in this PR).
  • Looking at the callers of PageServerConf::metadata_path, there is still an assert in calculate_logical_size for the file to exist. cleanup_remaining_timeline_fs_traces already doesn't error if the file doesn't exist. This should probably be addressed ideally in this PR, but at least in the deploy before stop writing metadata file #6769 is deployed.
  • Other than that I didn't find anything.

@koivunej
Copy link
Member

koivunej commented Feb 15, 2024

  • I grepped for TimelineMetadata::from_bytes and outside of tests and the code removed in this PR I only found usage in one location: the pageserver ctl tool still has a command to read the metadata file from disk (see handle_metadata function), but I think this can be a followup (but also ok with having it in this PR).

how is the metadata stored within index_part.json then deserialized? It uses that, not only from tests. Nevermind, there is a manual Serialize and Deserialize on the type.

Copy link

github-actions bot commented Feb 15, 2024

2436 tests run: 2319 passed, 0 failed, 117 skipped (full report)


Flaky tests (2)

Postgres 15

  • test_ondemand_download_timetravel: debug
  • test_pg_regress[None]: debug

Code coverage (full report)

  • functions: 55.9% (12875 of 23034 functions)
  • lines: 82.5% (69819 of 84629 lines)

The comment gets automatically updated with the latest test results
0388c98 at 2024-02-15T23:20:19.838Z :recycle:

@problame
Copy link
Contributor Author

the pageserver ctl tool still has a command to read the metadata file from disk (see handle_metadata function), but I think this can be a followup (but also ok with having it in this PR).

Good find, will put that into a follow-up.

Looking at the callers of PageServerConf::metadata_path, there is still an assert in calculate_logical_size for the file to exist.

Isn't that in a failpoint? I'd have to remove the test that uses it as as well, then.

I think it actually makes sense to keep it until #6769 as it's concerned with removal of the file / the write path.

cleanup_remaining_timeline_fs_traces already doesn't error if the file doesn't exist.
This should probably be addressed ideally in this PR, but at least in the deploy before #6769 is deployed.

I'm confused. Assuming I'm right wrt the faillpoint in the previous quote, do you think there's more work that needs to happen or can we merge this PR as is.

@arpad-m
Copy link
Member

arpad-m commented Feb 16, 2024

Isn't that in a failpoint? I'd have to remove the test that uses it as as well, then.

Good point, let's keep it (for now).

@problame problame enabled auto-merge (squash) February 16, 2024 08:31
@problame problame merged commit 45e929c into main Feb 16, 2024
51 checks passed
@problame problame deleted the problame/integrate-tokio-epoll-uring/write-path/remove-save-metadata--part-1 branch February 16, 2024 09:35
problame added a commit that referenced this pull request Feb 23, 2024
Building atop #6777, this PR removes the code that writes the `metadata`
file and adds a piece of migration code that removes any remaining
`metadata` files.

We'll remove the migration code after this PR has been deployed.

part of #6663

More cleanups punted into follow-up issue, as they touch a lot of code: 
#6890
problame added a commit that referenced this pull request Mar 5, 2024
part of #6663 
See that epic for more context & related commits.

Problem
-------

Before this PR, the layer-file-creating code paths were using
VirtualFile, but under the hood these were still blocking system calls.

Generally this meant we'd stall the executor thread, unless the caller
"knew" and used the following pattern instead:

```
spawn_blocking(|| {
    Handle::block_on(async {
        VirtualFile::....().await;
    })
}).await
```

Solution
--------

This PR adopts `tokio-epoll-uring` on the layer-file-creating code paths
in pageserver.

Note that on-demand downloads still use `tokio::fs`, these will be
converted in a future PR.

Design: Avoiding Regressions With `std-fs` 
------------------------------------------

If we make the VirtualFile write path truly async using
`tokio-epoll-uring`, should we then remove the `spawn_blocking` +
`Handle::block_on` usage upstack in the same commit?

No, because if we’re still using the `std-fs` io engine, we’d then block
the executor in those places where previously we were protecting us from
that through the `spawn_blocking` .

So, if we want to see benefits from `tokio-epoll-uring` on the write
path while also preserving the ability to switch between
`tokio-epoll-uring` and `std-fs` , where `std-fs` will behave identical
to what we have now, we need to ***conditionally* use `spawn_blocking +
Handle::block_on`** .

I.e., in the places where we use that know, we’ll need to make that
conditional based on the currently configured io engine.

It boils down to investigating all the places where we do
`spawn_blocking(... block_on(... VirtualFile::...))`.

Detailed [write-up of that investigation in
Notion](https://neondatabase.notion.site/Surveying-VirtualFile-write-path-usage-wrt-tokio-epoll-uring-integration-spawn_blocking-Handle-bl-5dc2270dbb764db7b2e60803f375e015?pvs=4
), made publicly accessible.

tl;dr: Preceding PRs addressed the relevant call sites:
- `metadata` file: turns out we could simply remove it (#6777, #6769,
#6775)
- `create_delta_layer()`: made sensitive to `virtual_file_io_engine` in
#6986

NB: once we are switched over to `tokio-epoll-uring` everywhere in
production, we can deprecate `std-fs`; to keep macOS support, we can use
`tokio::fs` instead. That will remove this whole headache.


Code Changes In This PR
-----------------------

- VirtualFile API changes
  - `VirtualFile::write_at`
- implement an `ioengine` operation and switch `VirtualFile::write_at`
to it
  - `VirtualFile::metadata()`
- curiously, we only use it from the layer writers' `finish()` methods
- introduce a wrapper `Metadata` enum because `std::fs::Metadata` cannot
be constructed by code outside rust std
- `VirtualFile::sync_all()` and for completeness sake, add
`VirtualFile::sync_data()`

Testing & Rollout
-----------------

Before merging this PR, we ran the CI with both io engines.

Additionally, the changes will soak in staging.

We could have a feature gate / add a new io engine
`tokio-epoll-uring-write-path` to do a gradual rollout. However, that's
not part of this PR.


Future Work
-----------

There's still some use of `std::fs` and/or `tokio::fs` for directory
namespace operations, e.g. `std::fs::rename`.

We're not addressing those in this PR, as we'll need to add the support
in tokio-epoll-uring first. Note that rename itself is usually fast if
the directory is in the kernel dentry cache, and only the fsync after
rename is slow. These fsyncs are using tokio-epoll-uring, so, the impact
should be small.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants