Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Avoid reallocating fd table during snapshot restore #4107

Merged
merged 4 commits into from
Sep 15, 2023

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Sep 13, 2023

Linux maintains a per-process file descriptor table. By default, this table has size 64. Whenever space in this table runs out, the file descriptor table gets reallocated, with its size increased to the next power of two.

Firecracker creates a lot of eventfds and timerfds for its devices. These are created in the hot path of snapshot restore. For medium to large microVMs, firecracker uses more than 64 file descriptors, meaning we reallocate the file descriptor table on the hot path. However, this reallocation uses a RCU, meaning the kernel needs to hold a lock. To acquire this lock, it needs to wait for all other accesses to the file descriptors to cease, which introduces a severe latency to snapshot-restore times (between 30ms and 70ms).

This commit avoids that latency by ensuring the file descriptor table is large enough to hold the jailer-defined limit of file descriptors at process start already. This avoids reallocating the file descriptor table at runtime (and the memory overhead is negligable, as each entry in the fdtable is simply a pointer).

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.09% ⚠️

Comparison is base (eb38da7) 82.34% compared to head (0be2b96) 82.26%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4107      +/-   ##
==========================================
- Coverage   82.34%   82.26%   -0.09%     
==========================================
  Files         225      225              
  Lines       28445    28474      +29     
==========================================
  Hits        23424    23424              
- Misses       5021     5050      +29     
Flag Coverage Δ
4.14-c7g.metal 77.66% <0.00%> (-0.10%) ⬇️
4.14-m5d.metal 79.56% <0.00%> (-0.10%) ⬇️
4.14-m6a.metal 78.66% <0.00%> (-0.10%) ⬇️
4.14-m6g.metal 77.66% <0.00%> (-0.10%) ⬇️
4.14-m6i.metal 79.54% <0.00%> (-0.10%) ⬇️
5.10-c7g.metal 80.58% <0.00%> (-0.10%) ⬇️
5.10-m5d.metal 82.22% <0.00%> (-0.10%) ⬇️
5.10-m6a.metal 81.44% <0.00%> (-0.11%) ⬇️
5.10-m6g.metal 80.58% <0.00%> (-0.10%) ⬇️
5.10-m6i.metal 82.21% <0.00%> (-0.10%) ⬇️
6.1-c7g.metal 80.58% <0.00%> (-0.10%) ⬇️
6.1-m5d.metal 82.23% <0.00%> (-0.10%) ⬇️
6.1-m6a.metal 81.44% <0.00%> (-0.11%) ⬇️
6.1-m6g.metal 80.58% <0.00%> (?)
6.1-m6i.metal 82.21% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/firecracker/src/main.rs 0.21% <0.00%> (-0.02%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat roypat force-pushed the prealloc-fdtable branch 2 times, most recently from b61953e to 9063c9d Compare September 13, 2023 15:18
bchalios
bchalios previously approved these changes Sep 14, 2023
ShadowCurse
ShadowCurse previously approved these changes Sep 14, 2023
@roypat roypat dismissed stale reviews from ShadowCurse and bchalios via bc0b7d1 September 14, 2023 10:04
@roypat roypat force-pushed the prealloc-fdtable branch 4 times, most recently from abdf7fe to c0398f7 Compare September 14, 2023 10:34
src/firecracker/src/main.rs Outdated Show resolved Hide resolved
src/firecracker/src/main.rs Outdated Show resolved Hide resolved
@roypat roypat force-pushed the prealloc-fdtable branch 3 times, most recently from 2ae09be to 65cfa5c Compare September 14, 2023 10:59
CHANGELOG.md Outdated Show resolved Hide resolved
Linux maintains a per-process file descriptor table. By default, this
table has size 64. Whenever space in this table runs out, the file
descriptor table gets reallocated, with its size increased to the next
power of two.

Firecracker creates a lot of eventfds and timerfds for its devices.
These are created in the hot path of snapshot restore. For medium to
large microVMs, firecracker uses more than 64 file descriptors, meaning
we reallocate the file descriptor table on the hot path. However, this
reallocation uses a RCU, meaning the kernel needs to hold a lock. To
acquire this lock, it needs to wait for all other accesses to the file
descriptors to cease, which introduces a severe latency to
snapshot-restore times (between 30ms and 70ms).

This commit avoids that latency by ensuring the file descriptor table is
large enough to hold the jailer-defined limit of file descriptors at
process start already. This avoids reallocating the file descriptor
table at runtime (and the memory overhead is negligable, as each entry
in the fdtable is simply a pointer).

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat requested a review from bchalios September 15, 2023 08:43
Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@bchalios bchalios merged commit e2a9ba4 into firecracker-microvm:main Sep 15, 2023
4 of 5 checks passed
@roypat roypat deleted the prealloc-fdtable branch April 15, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants