-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] vhost-user-blk: initial implementation #4069
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #4069 +/- ##
==========================================
- Coverage 82.29% 80.74% -1.55%
==========================================
Files 225 231 +6
Lines 28470 29238 +768
==========================================
+ Hits 23429 23609 +180
- Misses 5041 5629 +588
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
78bb4df
to
0a0baf3
Compare
9dcbff1
to
df25512
Compare
ddc0313
to
bcabb3e
Compare
This is to simplify error chaining when propagating error causes when using ActivateError. Signed-off-by: Nikita Kalyazin <[email protected]>
This is to simplify error chaining when using PersistError with device-specific errors. Signed-off-by: Nikita Kalyazin <[email protected]>
The change is needed to be able to share the guest memory with another process that needs access to it (eg a vhost-user backend). The change includes opening the memory snapshot file with read and write permissions as the guest memory changes due to the guest execution will now be visible as changes of the memory snapshot file. A consiquence of that is multiple microVMs can no longer be restored from the same physical memory snapshot file. Signed-off-by: Nikita Kalyazin <[email protected]>
When sharing guest memory with another process (eg a vhost-user backend), a file descriptor pointing at the guest memory is needed that can be passed to the other process via a sendmsg syscall. This is achieved via backing guest memory with memfd when launching a microVM and via making use of the memory snapshot fd when restoring from a snapshot. A memfd is sealed after creation to prevent further changes of the guest memory size (both growing and shrinking). This change introduces a new crate dependency: memfd. Signed-off-by: Nikita Kalyazin <[email protected]>
The test sets RLIMIT_FSIZE to 2M and expects for the jailer to kill Firecracker when the latter tries to create a big memory snapshot file. This test is no longer relevant, because Firecracker now creates guest memory with memfd, which makes minimum viable RLIMIT_FSIZE equal to the guest memory size. Signed-off-by: Nikita Kalyazin <[email protected]>
After each interrupt injection, Virtio MMIO drivers read interrupt status from register 0x60 and then acknowledge the interrupt via register 0x64. Some devices, like vhost-user-backed devices, cannot meaningfully update the interrupt status to the guest. In such cases, we keep the interrupt status always active. In order to distinguish devices that are able to update the interrupt status, we introduce a new method in the VirtioDevice trait: can_update_interrupt_status() that is used by the MMIO when handling interrupt status read operations from the guest. Signed-off-by: Nikita Kalyazin <[email protected]>
In order to be able to have multiple subtypes under the same API endpoint (eg /drives), along with the existing device type, we introduce a device subtype. This change creates a 1:1 mapping between device type and device subtype for now. This will be extended later. Signed-off-by: Nikita Kalyazin <[email protected]>
In order to allow creation of other block device subtypes, we move the existing host-file-backed block device implementation from block to block/file directory. Signed-off-by: Nikita Kalyazin <[email protected]>
In order to distinguish host-file-backed block device, this renames: - Block to BlockFile - ConnectedBlockState to ConnectedBlockFileState - BlockState to BlockFileState - BlockConstructorArgs to BlockFileConstructorArgs - BlockError to BlockFileError - CreateBlockDevice to CreateBlockFileDevice - SUBTYPE_BLOCK to SUBTYPE_BLOCK_FILE - SharedDeviceType enum's variant Block to BlockFile - DevicePersistError enum's variant Block to BlockFile - add_device() to add_file_device() - create_block() to create_block_file() Signed-off-by: Nikita Kalyazin <[email protected]>
Since all block devices share some of properties, to avoid duplication, they are extracted into a common struct DiskAttributes. Also a trait Disk is introduces to reexpose DiskAttributes functionality at the block device level. Signed-off-by: Nikita Kalyazin <[email protected]>
In order to allow creation of other block device subtypes, we wrap former Block structure into an enum. At the moment it contains a single variant FileBacked. Further on it will be extended with other device subtypes. Signed-off-by: Nikita Kalyazin <[email protected]>
In order to separate properties specific to different block device subtypes, we introduce `file` object within BlockDeviceConfig that contains parts specific to the host-file-backed block device. If provided by the user, `file` configuration has priority over top-level properties. Also this change makes `path_on_host` top-level property optional to allow configuration only via `file` object. This is not a breaking API change. Signed-off-by: Nikita Kalyazin <[email protected]>
In order to separate properties specific to different block device subtypes, we introduce `file` object within BlockDeviceUpdateConfig that contains parts specific to the host-file-backed block device. If provided by the user, `file` configuration has priority over top-level properties. Signed-off-by: Nikita Kalyazin <[email protected]>
This rearranges Drive object's properties to maintain an alphabetical order. Signed-off-by: Nikita Kalyazin <[email protected]>
This updates swagger configuration for PUT and PATCH requests for /drives endpoint to reflect introduction of `file` object. Signed-off-by: Nikita Kalyazin <[email protected]>
In order to be able to restore different block device subtypes from a snapshot, this introduces an additional list block_device_subtypes in the DeviceStates struct. This approach was chosen to minimise disturbance of the versionized DeviceStates struct, compared to an alternative solution of converting ConnectedBlockState into an enum. On save, if the snapshot version does not support device subtypes, we only proceed with taking the snapshot if only SUBTYPE_BLOCK_FILE devices are attached to the microVM. On restore, if the snapshot version does not support device subtypes, we populate the list with SUBTYPE_BLOCK_FILE entries on restore. Signed-off-by: Nikita Kalyazin <[email protected]>
This is to be able to query queue's avail index to share it with a vhost-user backend. Signed-off-by: Nikita Kalyazin <[email protected]>
This vhost-user protocol implementation that is shared between vhost-user devices. Signed-off-by: Nikita Kalyazin <[email protected]> Co-authored-by: Diana Popa <[email protected]>
This adds vhost-user block device implementation. In order to create a vhost-user block device, a PUT /drives request needs to be issued that contains `vhost-user` object. At the moment, the object has the only property `socket` that is a socket path to communicate with a vhost-user backend. Example: ``` curl --unix-socket ${fc_socket} -i \ -X PUT "http://localhost/drives/vhost" \ -H "accept: application/json" \ -H "Content-Type: application/json" \ -d "{ \"drive_id\": \"vhost\", \"is_root_device\": false, \"cache_type\": \"Unsafe\", \"vhost_user\": { \"socket\": \"${vhost_socket}\" } }" ``` Signed-off-by: Nikita Kalyazin <[email protected]> Co-authored-by: Diana Popa <[email protected]>
This addes vhost-user optional object in the Drive object to provide vhost-user-specific configuration. Signed-off-by: Nikita Kalyazin <[email protected]>
This work has been superseded by other PRs: #4138, #4170, #4223, #4226 |
Changes
This change introduces a vhost-user block device.
A vhost-user drive can be added to a microVM via
where
vhost_socket
is the socket offered by a vhost-user backend.I used a backend from Cloud Hypervisor in my testing:
where
img
is the image file. The backend needs to start first.In order for snapshot restore to work, I had to patch the Cloud Hypervisor backend: https://github.com/kalyazin/cloud-hypervisor/tree/vhost_user_block_restore .
The patchset can be logically split into major parts:
memfd
for backing guest memory to make it shareable with another processI kept moving and renaming commits fine grained. It helped me, but I would not object to squashing them if needed.
The following is out of scope of this PR and will be addressed separately:
Reason
Better performance and flexibility for block devices.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
[ ] If a specific issue led to this PR, this PR closes the issue.[ ] Any required documentation changes (code and docs) are included in this PR.[ ] User-facing changes are mentioned inCHANGELOG.md
.[ ] NewTODO
s link to an issue.rust-vmm
.