Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] vhost-user-blk: initial implementation #4069

Closed
wants to merge 20 commits into from

Conversation

kalyazin
Copy link
Contributor

@kalyazin kalyazin commented Aug 22, 2023

Changes

This change introduces a vhost-user block device.

A vhost-user drive can be added to a microVM via

curl --unix-socket ${fc_socket} -i \
      -X PUT "http://localhost/drives/vhost" \
      -H "accept: application/json" \
      -H "Content-Type: application/json" \
      -d "{
                \"drive_id\": \"vhost\",
                \"is_root_device\": false,
                \"cache_type\": \"Unsafe\",
                \"vhost_user\": {
                  \"socket\": \"${vhost_socket}\"
                }
          }"

where vhost_socket is the socket offered by a vhost-user backend.
I used a backend from Cloud Hypervisor in my testing:

cargo build -p vhost_user_block
./target/debug/vhost_user_block/vhost_user_block \
  --block-backend path=${img},socket=${vhost_socket},num_queues=1,readonly=true,direct=false

where img is the image file. The backend needs to start first.

In order for snapshot restore to work, I had to patch the Cloud Hypervisor backend: https://github.com/kalyazin/cloud-hypervisor/tree/vhost_user_block_restore .

The patchset can be logically split into major parts:

  • using memfd for backing guest memory to make it shareable with another process
  • moving and renaming the existing block device implementation to give space to the new one
  • adding new API for the existing block device to separate configurations specific to each device type
  • adding vhost-user protocol that is generic for all vhost-user devices (eg block, net, vsock)
  • adding vhost-user block device implementation.

I kept moving and renaming commits fine grained. It helped me, but I would not object to squashing them if needed.

The following is out of scope of this PR and will be addressed separately:

  • vhost-user-specific unit testing
  • functional integration testing
  • performance testing
  • user documentation

Reason

Better performance and flexibility for block devices.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • [ ] If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • [ ] Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • [ ] User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • [ ] New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link

codecov bot commented Aug 22, 2023

Codecov Report

Patch coverage: 42.56% and project coverage change: -1.55% ⚠️

Comparison is base (840cbd9) 82.29% compared to head (d68871c) 80.74%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4069      +/-   ##
==========================================
- Coverage   82.29%   80.74%   -1.55%     
==========================================
  Files         225      231       +6     
  Lines       28470    29238     +768     
==========================================
+ Hits        23429    23609     +180     
- Misses       5041     5629     +588     
Flag Coverage Δ
4.14-c7g.metal 76.01% <42.56%> (-1.68%) ⬇️
4.14-m5d.metal 77.97% <42.16%> (-1.60%) ⬇️
4.14-m6a.metal 77.06% <42.16%> (-1.63%) ⬇️
4.14-m6g.metal 76.01% <42.56%> (-1.69%) ⬇️
4.14-m6i.metal 77.97% <42.16%> (-1.60%) ⬇️
5.10-c7g.metal 78.84% <42.38%> (-1.78%) ⬇️
5.10-m5d.metal 80.58% <41.98%> (-1.68%) ⬇️
5.10-m6a.metal 79.75% <41.98%> (-1.72%) ⬇️
5.10-m6g.metal 78.84% <42.38%> (-1.78%) ⬇️
5.10-m6i.metal 80.57% <41.98%> (-1.68%) ⬇️
6.1-c7g.metal 78.84% <42.38%> (-1.78%) ⬇️
6.1-m5d.metal 80.57% <41.98%> (-1.68%) ⬇️
6.1-m6a.metal 79.75% <41.98%> (-1.72%) ⬇️
6.1-m6g.metal 78.84% <42.38%> (-1.78%) ⬇️
6.1-m6i.metal 80.57% <41.98%> (-1.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/vmm/src/arch/aarch64/mod.rs 94.59% <ø> (ø)
src/vmm/src/arch/mod.rs 80.00% <ø> (ø)
...vmm/src/devices/virtio/block/file/event_handler.rs 74.24% <ø> (ø)
...c/vmm/src/devices/virtio/block/file/io/async_io.rs 95.73% <ø> (ø)
src/vmm/src/devices/virtio/block/file/io/mod.rs 95.23% <ø> (ø)
...rc/vmm/src/devices/virtio/block/file/io/sync_io.rs 92.59% <ø> (ø)
src/vmm/src/devices/virtio/block/file/mod.rs 0.00% <0.00%> (ø)
.../vmm/src/devices/virtio/block/vhost_user/device.rs 0.00% <0.00%> (ø)
...c/devices/virtio/block/vhost_user/event_handler.rs 0.00% <0.00%> (ø)
src/vmm/src/devices/virtio/block/vhost_user/mod.rs 0.00% <0.00%> (ø)
... and 26 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This is to simplify error chaining when propagating
error causes when using ActivateError.

Signed-off-by: Nikita Kalyazin <[email protected]>
This is to simplify error chaining when using PersistError
with device-specific errors.

Signed-off-by: Nikita Kalyazin <[email protected]>
The change is needed to be able to share the guest memory
with another process that needs access to it
(eg a vhost-user backend).

The change includes opening the memory snapshot file with
read and write permissions as the guest memory changes
due to the guest execution will now be visible
as changes of the memory snapshot file.

A consiquence of that is multiple microVMs can no longer
be restored from the same physical memory snapshot file.

Signed-off-by: Nikita Kalyazin <[email protected]>
When sharing guest memory with another process
(eg a vhost-user backend), a file descriptor pointing
at the guest memory is needed that can be passed
to the other process via a sendmsg syscall.
This is achieved via backing guest memory with memfd
when launching a microVM and via making use of
the memory snapshot fd when restoring from a snapshot.

A memfd is sealed after creation to prevent further
changes of the guest memory size (both growing and shrinking).

This change introduces a new crate dependency: memfd.

Signed-off-by: Nikita Kalyazin <[email protected]>
The test sets RLIMIT_FSIZE to 2M and expects
for the jailer to kill Firecracker
when the latter tries to create a big memory snapshot file.

This test is no longer relevant, because Firecracker
now creates guest memory with memfd, which makes minimum
viable RLIMIT_FSIZE equal to the guest memory size.

Signed-off-by: Nikita Kalyazin <[email protected]>
After each interrupt injection, Virtio MMIO drivers
read interrupt status from register 0x60 and then
acknowledge the interrupt via register 0x64.

Some devices, like vhost-user-backed devices, cannot
meaningfully update the interrupt status to the guest.
In such cases, we keep the interrupt status always active.
In order to distinguish devices that are able to update
the interrupt status, we introduce a new method
in the VirtioDevice trait: can_update_interrupt_status()
that is used by the MMIO when handling interrupt status
read operations from the guest.

Signed-off-by: Nikita Kalyazin <[email protected]>
In order to be able to have multiple subtypes
under the same API endpoint (eg /drives),
along with the existing device type, we introduce
a device subtype.

This change creates a 1:1 mapping between
device type and device subtype for now.
This will be extended later.

Signed-off-by: Nikita Kalyazin <[email protected]>
In order to allow creation of other block device subtypes,
we move the existing host-file-backed block device
implementation from block to block/file directory.

Signed-off-by: Nikita Kalyazin <[email protected]>
kalyazin and others added 12 commits September 22, 2023 17:19
In order to distinguish host-file-backed block device,
this renames:
 - Block to BlockFile
 - ConnectedBlockState to ConnectedBlockFileState
 - BlockState to BlockFileState
 - BlockConstructorArgs to BlockFileConstructorArgs
 - BlockError to BlockFileError
 - CreateBlockDevice to CreateBlockFileDevice
 - SUBTYPE_BLOCK to SUBTYPE_BLOCK_FILE
 - SharedDeviceType enum's variant Block to BlockFile
 - DevicePersistError enum's variant Block to BlockFile
 - add_device() to add_file_device()
 - create_block() to create_block_file()

Signed-off-by: Nikita Kalyazin <[email protected]>
Since all block devices share some of properties,
to avoid duplication, they are extracted into
a common struct DiskAttributes.
Also a trait Disk is introduces to reexpose
DiskAttributes functionality at the block device level.

Signed-off-by: Nikita Kalyazin <[email protected]>
In order to allow creation of other block device subtypes,
we wrap former Block structure into an enum.
At the moment it contains a single variant FileBacked.
Further on it will be extended with other device subtypes.

Signed-off-by: Nikita Kalyazin <[email protected]>
In order to separate properties specific to
different block device subtypes, we introduce
`file` object within BlockDeviceConfig
that contains parts specific to the host-file-backed
block device.

If provided by the user, `file` configuration
has priority over top-level properties.

Also this change makes `path_on_host` top-level property
optional to allow configuration only via `file` object.
This is not a breaking API change.

Signed-off-by: Nikita Kalyazin <[email protected]>
In order to separate properties specific to
different block device subtypes, we introduce
`file` object within BlockDeviceUpdateConfig
that contains parts specific to the host-file-backed
block device.

If provided by the user, `file` configuration
has priority over top-level properties.

Signed-off-by: Nikita Kalyazin <[email protected]>
This rearranges Drive object's properties to maintain
an alphabetical order.

Signed-off-by: Nikita Kalyazin <[email protected]>
This updates swagger configuration for PUT and PATCH
requests for /drives endpoint to reflect introduction
of `file` object.

Signed-off-by: Nikita Kalyazin <[email protected]>
In order to be able to restore different block device subtypes
from a snapshot, this introduces an additional list
block_device_subtypes in the DeviceStates struct.

This approach was chosen to minimise disturbance
of the versionized DeviceStates struct, compared to
an alternative solution of converting ConnectedBlockState
into an enum.

On save, if the snapshot version does not support device subtypes,
we only proceed with taking the snapshot if only SUBTYPE_BLOCK_FILE
devices are attached to the microVM.

On restore, if the snapshot version does not support device subtypes,
we populate the list with SUBTYPE_BLOCK_FILE entries on restore.

Signed-off-by: Nikita Kalyazin <[email protected]>
This is to be able to query queue's avail index
to share it with a vhost-user backend.

Signed-off-by: Nikita Kalyazin <[email protected]>
This vhost-user protocol implementation that is shared
between vhost-user devices.

Signed-off-by: Nikita Kalyazin <[email protected]>
Co-authored-by: Diana Popa <[email protected]>
This adds vhost-user block device implementation.

In order to create a vhost-user block device,
a PUT /drives request needs to be issued that
contains `vhost-user` object.
At the moment, the object has the only property `socket`
that is a socket path to communicate with
a vhost-user backend.

Example:
```
curl --unix-socket ${fc_socket} -i \
  -X PUT "http://localhost/drives/vhost" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d "{
            \"drive_id\": \"vhost\",
            \"is_root_device\": false,
            \"cache_type\": \"Unsafe\",
            \"vhost_user\": {
              \"socket\": \"${vhost_socket}\"
            }
      }"
```

Signed-off-by: Nikita Kalyazin <[email protected]>
Co-authored-by: Diana Popa <[email protected]>
This addes vhost-user optional object in the Drive object
to provide vhost-user-specific configuration.

Signed-off-by: Nikita Kalyazin <[email protected]>
@kalyazin
Copy link
Contributor Author

This work has been superseded by other PRs: #4138, #4170, #4223, #4226
Also see https://github.com/firecracker-microvm/firecracker/blob/main/CHANGELOG.md#added

@kalyazin kalyazin closed this Nov 23, 2023
@kalyazin kalyazin deleted the vub branch November 23, 2023 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant