Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(walredo): use posix_spawn by moving close_fds() work to walredo C code #6574

Merged

Conversation

problame
Copy link
Contributor

@problame problame commented Feb 1, 2024

The rust stdlib uses the efficient posix_spawn by default.
However, before this PR, pageserver used pre_exec() in our close_fds() ext trait.

This PR moves the work that close_fds() did to the walredo C code.
I verified manually using gdb that we're now forking out the walredo process using posix_spawn.

refs #6565

@problame problame requested review from a team as code owners February 1, 2024 17:00
@problame problame requested review from save-buffer, koivunej, hlinnaka and jcsp and removed request for a team and save-buffer February 1, 2024 17:00
@problame problame changed the title feat(walredo): use posix_spawn by moving close_fd() work to walredo C code feat(walredo): use posix_spawn by moving close_fds() work to walredo C code Feb 1, 2024
@problame problame force-pushed the problame/2024-02-walredo-work/4-close-range-in-child branch from 9ad51a2 to b524d37 Compare February 1, 2024 17:07
… code

The rust stdlib uses the efficient `posix_spawn` by default.
However, before this PR, pageserver used `pre_exec()` in our
`close_fds()` ext trait.

This PR moves the work that `close_fds()` did to the walredo C code.
I verified manually that we're now forking out the walredo process using
`posix_spawn`.

refs #6565
@problame problame force-pushed the problame/2024-02-walredo-work/4-close-range-in-child branch from b524d37 to 36f3d87 Compare February 1, 2024 17:11
Copy link
Collaborator

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- if this gets us the performance boost without needing further changes, that's a huge win.

Not sure if we need to ping someone who compiles on a Mac to check if some of the syscall stuff needs #ifdef'ing, but that's secondary to getting this change into staging to see the latency change.

@problame
Copy link
Contributor Author

problame commented Feb 1, 2024

I manually tested compilation on my Mac.

Copy link

github-actions bot commented Feb 1, 2024

2376 tests run: 2263 passed, 0 failed, 113 skipped (full report)


Flaky tests (8)

Postgres 16

Postgres 15

Postgres 14

Code coverage (full report)

  • functions: 54.5% (11231 of 20606 functions)
  • lines: 81.5% (63280 of 77622 lines)

The comment gets automatically updated with the latest test results
36f3d87 at 2024-02-01T20:22:59.365Z :recycle:

@problame problame merged commit 1be5e56 into main Feb 1, 2024
47 checks passed
@problame problame deleted the problame/2024-02-walredo-work/4-close-range-in-child branch February 1, 2024 21:38
Comment on lines +175 to +178
if (err = close_range(3, ~0U, 0)) {
ereport(FATAL, (errcode(ERRCODE_SYSTEM_ERROR), errmsg("seccomp: could not close files >= fd 3")));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't require seccomp; let's do this even when built without seccomp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (ab)used the HAVE_SECCOMP as a proxy for that we're on Linux.

Proximity to the seccomp code makes sense IMO because both (and only both) are the essential security building block for walredo.

I'd appreciate if you could provide a follow-up fix that

  1. adds a configure check for close_range, so we use the syscall wrapper if it's available in glibc, otherwise fall back to my wrapper & fail to compile if NR_close_range is not defined
  2. implements a "status report" walredo command as I suggested on Slack

Also, I'd like to suggest we add additional robustness checks to walredo, e.g., a message where PS can ask a freshly spawned walredo whether it thinks it correctly set up seccomp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd like to suggest we add additional robustness checks to walredo, e.g., a message where PS can ask a freshly spawned walredo whether it thinks it correctly set up seccomp.

Is that the "status report" command you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #6580 to track this

@hlinnaka
Copy link
Contributor

hlinnaka commented Feb 2, 2024

To speed this up further, the pageserver could keep a small pool of pre-spawned WAL redo processes. As soon as you use a WAL redo process on a specific tenant/timeline, it must be considered as "tainted" with that tenant, and mustn't be used to replay WAL belonging to any other tenant. But until that first use, there's nothing tenant-specific about them.

@problame
Copy link
Contributor Author

problame commented Feb 2, 2024

To speed this up further, the pageserver could keep a small pool of pre-spawned WAL redo processes. As soon as you use a WAL redo process on a specific tenant/timeline, it must be considered as "tainted" with that tenant, and mustn't be used to replay WAL belonging to any other tenant. But until that first use, there's nothing tenant-specific about them.

Yeah, that's what I've been proposing in the slack thread, I'm working on it.
#6581

problame added a commit that referenced this pull request Feb 16, 2024
…close_fds() work to walredo C code (#6574)"

the addition of close_range() to the C code remains, it doesn't matter
for the purposes of this reproducer.

This reverts parts of commit 1be5e56.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants