-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Avoid reallocating fd table during snapshot restore #4107
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #4107 +/- ##
==========================================
- Coverage 82.34% 82.26% -0.09%
==========================================
Files 225 225
Lines 28445 28474 +29
==========================================
Hits 23424 23424
- Misses 5021 5050 +29
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
b61953e
to
9063c9d
Compare
9063c9d
to
51325e2
Compare
abdf7fe
to
c0398f7
Compare
2ae09be
to
65cfa5c
Compare
cf87674
to
bbeb1c9
Compare
Linux maintains a per-process file descriptor table. By default, this table has size 64. Whenever space in this table runs out, the file descriptor table gets reallocated, with its size increased to the next power of two. Firecracker creates a lot of eventfds and timerfds for its devices. These are created in the hot path of snapshot restore. For medium to large microVMs, firecracker uses more than 64 file descriptors, meaning we reallocate the file descriptor table on the hot path. However, this reallocation uses a RCU, meaning the kernel needs to hold a lock. To acquire this lock, it needs to wait for all other accesses to the file descriptors to cease, which introduces a severe latency to snapshot-restore times (between 30ms and 70ms). This commit avoids that latency by ensuring the file descriptor table is large enough to hold the jailer-defined limit of file descriptors at process start already. This avoids reallocating the file descriptor table at runtime (and the memory overhead is negligable, as each entry in the fdtable is simply a pointer). Signed-off-by: Patrick Roy <[email protected]>
e19f027
to
e9b07f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
Linux maintains a per-process file descriptor table. By default, this table has size 64. Whenever space in this table runs out, the file descriptor table gets reallocated, with its size increased to the next power of two.
Firecracker creates a lot of eventfds and timerfds for its devices. These are created in the hot path of snapshot restore. For medium to large microVMs, firecracker uses more than 64 file descriptors, meaning we reallocate the file descriptor table on the hot path. However, this reallocation uses a RCU, meaning the kernel needs to hold a lock. To acquire this lock, it needs to wait for all other accesses to the file descriptors to cease, which introduces a severe latency to snapshot-restore times (between 30ms and 70ms).
This commit avoids that latency by ensuring the file descriptor table is large enough to hold the jailer-defined limit of file descriptors at process start already. This avoids reallocating the file descriptor table at runtime (and the memory overhead is negligable, as each entry in the fdtable is simply a pointer).
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
CHANGELOG.md
.TODO
s link to an issue.rust-vmm
.