Add interrupt subsystem to vm-device #8

jiangliu · 2019-09-08T02:56:59Z

This PR adds two subsystem to the vm-device crate:

interrupt subsystem to manage interrupts for virtual devices.
the resource allocation interfaces between the vm-device crate and VMMs.

Switch to rust 2018 edition and turn on deny(missing_docs). Signed-off-by: Liu Jiang <[email protected]>

Introduce traits InterruptManager and InterruptSourceGroup to manage interrupt sources for virtual devices. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>

src/interrupt/kvm_irq/mod.rs

liujing2 · 2019-09-09T13:03:40Z

src/resources.rs

+use std::{u16, u32, u64};
+
+/// Enumeration describing a device's resource requirements.
+pub enum DeviceResourceRequest {


I found there is a DeviceResources in previous commit, which is used to describe a specific device resource request. Is DeviceResourceRequest global resources of a VM, that is to say, it acts the same as vm-allocator crate before? If not, what's the difference between DeviceResourcesandDeviceResourceRequest` ?

The vmm asks each device for resource it needs, and the devices returns an array of DeviceResourceRequest to describe resource type and allocation constraints.
Then the VMM passes an array of allocated resource to the device, which is type of DeviceResources.

Thanks. I'm thinking whether we could merge DeviceResourceRequest and DeviceResources into one type? Use something like base: Option<> to make a request of non-specified and fill it to pass the allocation to device.

The have different roles. The DeviceResourceRequest focuses on constraints, and DeviceResource focuses on resources.

src/resources.rs

sameo

@jiangliu I think it would be easier for reviewers to split this PR into 2 PRs:

The interrupt manager/subsystem itself
The device resources and traits definitions

I think both can be reviewed independently, which is a good sign about your overall PR design quality ;-)

src/resources.rs

sameo · 2019-09-09T12:55:08Z

src/resources.rs

+    /// PCI MSIx IRQ numbers.
+    PciMsixIrq { base: u32, size: u32 },
+    /// Generic MSI IRQ numbers.
+    GenericIrq { base: u32, size: u32 },


Do we really need 3 enum types? Wouldn't:

pub enum MsiIrqType { PciMsi, PciMsix, GenericMsi, } pub enum DeviceResourceEntry { /// IO Port address range. PioAddressRange { base: u16, size: u16 }, /// Memory Mapped IO address range. MmioAddressRange { base: u64, size: u64 }, /// Legacy IRQ number. LegacyIrq(u32), /// MSI IRQ MsiIrq(type: MsiIrqType, base: u32, size: u32), /// Network Interface Card MAC address. NicMacAddresss(String), /// KVM memslot index. KvmMemSlot(u32), }

be simpler to handle?

Or even a MsiIrqType is needed at all?Depends on where we will use it.

Or even a MsiIrqType is needed at all?Depends on where we will use it.

A PCI device may support both PCI MSI and MSIx, and the guest kernel choose which one to use. So we need a way to distinguish different cases. If we assume only one type of MSI interrupt is supported by any device, we could remove the MSI type info.

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

I agree the kvm_irq implementation is orthogonal to the traits definitions, but I also think that this code could be shared across VMM implementations through the vm-device crate, as a default KVM implementation of the interrupt traits. Kind of similar to what we do with the mmap implemenation of the memory model.

We can (and probably should when considering device assignment) support multiple MSI types. From device backend side, it knows exactly which MSI type it exposed to guest so why it need the info here. What I'm seeing is this type is from what vm-device returned to VMM in ResourceRequest.

src/interrupt/mod.rs

chao-p

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

chao-p · 2019-09-10T08:19:59Z

src/resources.rs

+    /// Get the legacy interrupt number(IRQ).
+    pub fn get_legacy_irq(&self) -> Option<u32> {
+        for entry in self.0.iter().as_ref() {
+            if let DeviceResourceEntry::LegacyIrq(base) = entry {


Here (and below several routines) you seem to assume there is only one irq entry existed, which if it is true, then you probably need check that in append() above.

jiangliu · 2019-09-10T08:24:13Z

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

chao-p · 2019-09-10T08:25:26Z

src/resources.rs

+        /// Alignment for the allocated address.
+        align: u64,
+        /// Size for the allocated address range.
+        size: u64,


Can we use below definition to satisfy both fixed/unfixed and single/range address?

MmioAddress { base: Option<u64>, size: u64, align: u64 }

Same for PIO.

It depends on the semantics the vm-allocator plans to support. Current definition follows the linux kernel behavior. If we decide to simplify the vm-allocator semantics, the definition may be simplified too.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something. Also, this could be used for both the device request for resources and the allocated device resources as well. It simplifies a device resource description down to one enum.

Seems it can't allocate address range constraints for 32-bit PCI BAR. Or am I missing some points?
A tuple (min, max, size, align) should satisfy our requirements, but not sure whether it could be reduced as three-ele tuple (base, size, align).

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

With the above proposal we would not be able to express a request for a region that would cross the 32-bit limit, but I don't think this is something that exists today.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.
Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.

So I'm proposing a policy as:

allocate MMIO address below 4G for 32-bit BARs,

allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.

With cloud-hypervisor we had to support directly assigned devices that had both 32-bit and 64-bit BARs. Initially we were planning to actually only have 64-bit BARs but that does not work when you have more than 3 BARs (Due to the PCI config space limits). The lessons we learned from that is that:

We must put 32-bit BARs below 4G, on the 32-bit hole.

Assuming the 32-bit hole will be spacious enough for dynamic environments (with hotplugged and VFIO devices) and can fit all devices allocations is risky.

So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:

MmioAddress { base: Option<u64>, size: u64, align: u64 }

Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).

Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.

So I'm proposing a policy as:

allocate MMIO address below 4G for 32-bit BARs,

allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

By fallback you mean using the 32-bit address space when there is no 64-bit one?

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.

With cloud-hypervisor we had to support directly assigned devices that had both 32-bit and 64-bit BARs. Initially we were planning to actually only have 64-bit BARs but that does not work when you have more than 3 BARs (Due to the PCI config space limits). The lessons we learned from that is that:

We must put 32-bit BARs below 4G, on the 32-bit hole.

Assuming the 32-bit hole will be spacious enough for dynamic environments (with hotplugged and VFIO devices) and can fit all devices allocations is risky.

So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:

MmioAddress { base: Option<u64>, size: u64, align: u64 }

Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).

Adding flags essentially makes it a four-elements tuples, just different encoding method.

Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:

allocate MMIO address below 4G for 32-bit BARs,

allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

By fallback you mean using the 32-bit address space when there is no 64-bit one?
Yes, fallback means using 32-bit addresses for 64-bit BARs.

So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:

MmioAddress { base: Option<u64>, size: u64, align: u64 }

Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).

Adding flags essentially makes it a four-elements tuples, just different encoding method.

That's correct. I think the flags option gives us more flexibility on how we could describe constraints, though.
I'm trying to see what would be the more convenient way of using the same resource definition for both requested and reserved resources, for simplicity sake.
Using a definition like:

MmioAddress { base: Option<u64>, flags: u64, size: u64, align: u64 }

seems, at least to me, a more natural way of expressing both a requested MMIO resource and a reserved one.

Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:

allocate MMIO address below 4G for 32-bit BARs,

allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

By fallback you mean using the 32-bit address space when there is no 64-bit one?
Yes, fallback means using 32-bit addresses for 64-bit BARs.

I believe we should keep that as an allocator strategy. What the selected VMM allocator does with the following MMIO request:

let mut flags = 1u64 << MMIO_64; let requested_region = MMIOAddress { base: None, flags, size: 0x10000, align: 0x1000, }; is up to the allocator itself. It could very well decide to return a 32-bit address from the 32-bit hole.

chao-p · 2019-09-10T08:36:56Z

needs

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

Yes, definitely. For code sharing across VMMs you can move the code to another crate. I don't have strong opinion that we have to do like that way since it's already under the build-time control of 'kvm_irq'. But what I'm thinking is once other hypervisor(like Hyper-V) is added then it will add its implementation to vm-device as well.

sameo · 2019-09-09T23:04:44Z

src/resources.rs

+use std::{u16, u32, u64};
+
+/// Enumeration describing a device's resource requirements.
+pub enum DeviceResourceRequest {


nit: I would remove the Device prefix here. It's generic enough.

sameo · 2019-09-09T23:05:57Z

src/resources.rs

+    /// Request for a Memory Mapped IO address range.
+    MmioAddress {
+        /// Reserve the preallocated range [`fixed`, `fixed` + `size`] if it's not u64::MAX.
+        fixed: u64,


Can we get rid of fixed and define it as an equivalent of min=max?

Good suggestion:)

sameo · 2019-09-09T23:07:31Z

src/resources.rs

+        align: u64,
+        /// Size for the allocated address range.
+        size: u64,
+    },


How would a device ask for e.g. a BAR from the 32-bit hole? Implictly with max set to u32::MAX?

the min/max fields are designed to support 32-bit/64-bit PCI bars. For 32-bit PCI BARs, the max field should be set as u32::MAX. And the vm-allocator should respect all these constraints when allocating resources.

sameo · 2019-09-09T23:09:12Z

src/resources.rs

+    GenericIrq {
+        /// Number of Irqs to allocate.
+        size: u32,
+    },


Here as well, I would reduce all Irq requests to:

Irq { type: IrqType, size: u32, }

jiangliu · 2019-09-10T08:44:20Z

needs

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

Yes, definitely. For code sharing across VMMs you can move the code to another crate. I don't have strong opinion that we have to do like that way since it's already under the build-time control of 'kvm_irq'. But what I'm thinking is once other hypervisor(like Hyper-V) is added then it will add its implementation to vm-device as well.

The idea is to let the vm-device provide both interfaces and some default implementations.
Other hypervisor may provide its own implementation, which may or may not be merged into the vm-device crate.

chao-p · 2019-09-10T08:58:47Z

needs

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

Yes, definitely. For code sharing across VMMs you can move the code to another crate. I don't have strong opinion that we have to do like that way since it's already under the build-time control of 'kvm_irq'. But what I'm thinking is once other hypervisor(like Hyper-V) is added then it will add its implementation to vm-device as well.

The idea is to let the vm-device provide both interfaces and some default implementations.
Other hypervisor may provide its own implementation, which may or may not be merged into the vm-device crate.

sure.

Implement infrastructure to manage interrupt sources based on Linux KVM kernel module. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>

Implement InterruptSourceGroup trait to manage x86 legacy interruts. On x86 platforms, pin-based device interrupts connecting to the master PIC, the slave PIC and IOAPICs are named as legacy interrupts. For legacy interrupts, the interrupt routing logic are manged by the PICs/IOAPICs and the interrupt group logic only takes responsibility to enable/disable the interrupts. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>

Introduce generic mechanism to support message signalled interrupts, including PCI MSI/PCI MSIx/Generic interrupts. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>

Current hashval algorithm to generate key for kvm_irq_routing_entry will generate duplicated keys for legacy IRQ connecting to PIC and IOAPIC. So refine the algorithm to generate unique keys for every interrupt source. Signed-off-by: Zha Bin <[email protected]>

With some kvm version, setting irq_routing for non-existing legaccy IRQs may cause system crash. So limit the number to available legacy interrupts. Signed-off-by: 守情 <[email protected]>

Signed-off-by: 守情 <[email protected]>

Signed-off-by: Liu Jiang <[email protected]>

jiangliu · 2019-09-10T09:32:00Z

@jiangliu I think it would be easier for reviewers to split this PR into 2 PRs:

The interrupt manager/subsystem itself

The device resources and traits definitions

I think both can be reviewed independently, which is a good sign about your overall PR design quality ;-)

Seems most discussion are resource related, so have split the PR into two as you suggested.

sameo · 2019-10-29T09:19:33Z

This PR is replaced by #9 and #11

jiangliu added 2 commits August 5, 2019 00:38

Switch to rust 2018 edition

299a2ce

Switch to rust 2018 edition and turn on deny(missing_docs). Signed-off-by: Liu Jiang <[email protected]>

interrupt: introduce traits to manage interrupt sources

aeec5ec

Introduce traits InterruptManager and InterruptSourceGroup to manage interrupt sources for virtual devices. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>

jiangliu force-pushed the interrupt branch 2 times, most recently from 533c85e to 4fe1c9e Compare September 8, 2019 06:00

liujing2 reviewed Sep 9, 2019

View reviewed changes

src/interrupt/kvm_irq/mod.rs Show resolved Hide resolved

liujing2 reviewed Sep 9, 2019

View reviewed changes

src/interrupt/kvm_irq/mod.rs Show resolved Hide resolved

liujing2 reviewed Sep 9, 2019

View reviewed changes

src/resources.rs Outdated Show resolved Hide resolved

sameo reviewed Sep 9, 2019

View reviewed changes

chao-p reviewed Sep 10, 2019

View reviewed changes

src/interrupt/mod.rs Show resolved Hide resolved

chao-p reviewed Sep 10, 2019

View reviewed changes

src/interrupt/mod.rs Show resolved Hide resolved

chao-p reviewed Sep 10, 2019

View reviewed changes

src/interrupt/mod.rs Show resolved Hide resolved

chao-p reviewed Sep 10, 2019

View reviewed changes

sameo reviewed Sep 10, 2019

View reviewed changes

jiangliu and others added 7 commits September 10, 2019 17:20

Implement infrastructure to manage interrupts by KVM

7333c27

Implement infrastructure to manage interrupt sources based on Linux KVM kernel module. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>

Manage PCI MSI/PCI MSIx/Generic MSI interrupts

3770a7f

Introduce generic mechanism to support message signalled interrupts, including PCI MSI/PCI MSIx/Generic interrupts. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>

Limit number of legacy irqs

56c2c85

With some kvm version, setting irq_routing for non-existing legaccy IRQs may cause system crash. So limit the number to available legacy interrupts. Signed-off-by: 守情 <[email protected]>

unit test

8ee1d57

Signed-off-by: 守情 <[email protected]>

Added rust-vmm-ci as submodule

b94fa81

Signed-off-by: Liu Jiang <[email protected]>

jiangliu force-pushed the interrupt branch from 4fe1c9e to b94fa81 Compare September 10, 2019 09:29

jiangliu mentioned this pull request Sep 10, 2019

Introduce Resources and ResourceRequest to vm-device #9

Merged

sameo mentioned this pull request Oct 21, 2019

DO_NOT_MERGE: a draft design for interrupt management #6

Closed

jiangliu mentioned this pull request Oct 26, 2019

Add interrupt traits and a KVM based implementation #11

Closed

sameo closed this Oct 29, 2019

Add interrupt subsystem to vm-device #8

Add interrupt subsystem to vm-device #8

Conversation

jiangliu commented Sep 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liujing2 Sep 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sameo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chao-p left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiangliu commented Sep 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sameo Sep 10, 2019 • edited Loading

Choose a reason for hiding this comment

chao-p commented Sep 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiangliu commented Sep 10, 2019

chao-p commented Sep 10, 2019

jiangliu commented Sep 10, 2019

sameo commented Oct 29, 2019

liujing2 Sep 10, 2019 •

edited

Loading

sameo Sep 10, 2019 •

edited

Loading