Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interrupt subsystem to vm-device #8

Closed
wants to merge 9 commits into from

Conversation

jiangliu
Copy link
Member

@jiangliu jiangliu commented Sep 8, 2019

This PR adds two subsystem to the vm-device crate:

  1. interrupt subsystem to manage interrupts for virtual devices.
  2. the resource allocation interfaces between the vm-device crate and VMMs.

Switch to rust 2018 edition and turn on deny(missing_docs).

Signed-off-by: Liu Jiang <[email protected]>
Introduce traits InterruptManager and InterruptSourceGroup to manage
interrupt sources for virtual devices.

Signed-off-by: Liu Jiang <[email protected]>
Signed-off-by: Bin Zha <[email protected]>
@jiangliu jiangliu force-pushed the interrupt branch 2 times, most recently from 533c85e to 4fe1c9e Compare September 8, 2019 06:00
src/resources.rs Outdated
use std::{u16, u32, u64};

/// Enumeration describing a device's resource requirements.
pub enum DeviceResourceRequest {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found there is a DeviceResources in previous commit, which is used to describe a specific device resource request. Is DeviceResourceRequest global resources of a VM, that is to say, it acts the same as vm-allocator crate before? If not, what's the difference between DeviceResourcesandDeviceResourceRequest` ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vmm asks each device for resource it needs, and the devices returns an array of DeviceResourceRequest to describe resource type and allocation constraints.
Then the VMM passes an array of allocated resource to the device, which is type of DeviceResources.

Copy link

@liujing2 liujing2 Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm thinking whether we could merge DeviceResourceRequest and DeviceResources into one type? Use something like base: Option<> to make a request of non-specified and fill it to pass the allocation to device.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The have different roles. The DeviceResourceRequest focuses on constraints, and DeviceResource focuses on resources.

src/resources.rs Outdated Show resolved Hide resolved
Copy link

@sameo sameo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangliu I think it would be easier for reviewers to split this PR into 2 PRs:

  1. The interrupt manager/subsystem itself
  2. The device resources and traits definitions

I think both can be reviewed independently, which is a good sign about your overall PR design quality ;-)

src/resources.rs Outdated Show resolved Hide resolved
src/resources.rs Outdated
/// PCI MSIx IRQ numbers.
PciMsixIrq { base: u32, size: u32 },
/// Generic MSI IRQ numbers.
GenericIrq { base: u32, size: u32 },
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need 3 enum types? Wouldn't:

pub enum MsiIrqType {
       PciMsi,
       PciMsix,
       GenericMsi,
}

pub enum DeviceResourceEntry {
    /// IO Port address range.
    PioAddressRange { base: u16, size: u16 },
    /// Memory Mapped IO address range.
    MmioAddressRange { base: u64, size: u64 },
    /// Legacy IRQ number.
    LegacyIrq(u32),
    /// MSI IRQ
    MsiIrq(type: MsiIrqType, base: u32, size: u32), 
    /// Network Interface Card MAC address.
    NicMacAddresss(String),
    /// KVM memslot index.
    KvmMemSlot(u32),
}

be simpler to handle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even a MsiIrqType is needed at all?Depends on where we will use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even a MsiIrqType is needed at all?Depends on where we will use it.

A PCI device may support both PCI MSI and MSIx, and the guest kernel choose which one to use. So we need a way to distinguish different cases. If we assume only one type of MSI interrupt is supported by any device, we could remove the MSI type info.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

I agree the kvm_irq implementation is orthogonal to the traits definitions, but I also think that this code could be shared across VMM implementations through the vm-device crate, as a default KVM implementation of the interrupt traits. Kind of similar to what we do with the mmap implemenation of the memory model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (and probably should when considering device assignment) support multiple MSI types. From device backend side, it knows exactly which MSI type it exposed to guest so why it need the info here. What I'm seeing is this type is from what vm-device returned to VMM in ResourceRequest.

src/interrupt/mod.rs Show resolved Hide resolved
Copy link
Member

@chao-p chao-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

src/resources.rs Outdated
/// Get the legacy interrupt number(IRQ).
pub fn get_legacy_irq(&self) -> Option<u32> {
for entry in self.0.iter().as_ref() {
if let DeviceResourceEntry::LegacyIrq(base) = entry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and below several routines) you seem to assume there is only one irq entry existed, which if it is true, then you probably need check that in append() above.

@jiangliu
Copy link
Member Author

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

src/resources.rs Outdated
/// Alignment for the allocated address.
align: u64,
/// Size for the allocated address range.
size: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use below definition to satisfy both fixed/unfixed and single/range address?

MmioAddress {
    base: Option<u64>,
    size: u64,
    align: u64
}

Same for PIO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the semantics the vm-allocator plans to support. Current definition follows the linux kernel behavior. If we decide to simplify the vm-allocator semantics, the definition may be simplified too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something. Also, this could be used for both the device request for resources and the allocated device resources as well. It simplifies a device resource description down to one enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it can't allocate address range constraints for 32-bit PCI BAR. Or am I missing some points?
A tuple (min, max, size, align) should satisfy our requirements, but not sure whether it could be reduced as three-ele tuple (base, size, align).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

With the above proposal we would not be able to express a request for a region that would cross the 32-bit limit, but I don't think this is something that exists today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.
Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.

So I'm proposing a policy as:

  1. allocate MMIO address below 4G for 32-bit BARs,
  2. allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.

With cloud-hypervisor we had to support directly assigned devices that had both 32-bit and 64-bit BARs. Initially we were planning to actually only have 64-bit BARs but that does not work when you have more than 3 BARs (Due to the PCI config space limits). The lessons we learned from that is that:

  1. We must put 32-bit BARs below 4G, on the 32-bit hole.
  2. Assuming the 32-bit hole will be spacious enough for dynamic environments (with hotplugged and VFIO devices) and can fit all devices allocations is risky.

So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:

MmioAddress {
    base: Option<u64>,
    size: u64,
    align: u64
}

Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).

Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.

So I'm proposing a policy as:

  1. allocate MMIO address below 4G for 32-bit BARs,
  2. allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

By fallback you mean using the 32-bit address space when there is no 64-bit one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.

FWIW, I think having a 32-bit base field implicitly means we want a 32-bit region. Same as the max field being set to u32:MAX in your current proposal.

I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.

With cloud-hypervisor we had to support directly assigned devices that had both 32-bit and 64-bit BARs. Initially we were planning to actually only have 64-bit BARs but that does not work when you have more than 3 BARs (Due to the PCI config space limits). The lessons we learned from that is that:

  1. We must put 32-bit BARs below 4G, on the 32-bit hole.
  2. Assuming the 32-bit hole will be spacious enough for dynamic environments (with hotplugged and VFIO devices) and can fit all devices allocations is risky.

So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:

MmioAddress {
    base: Option<u64>,
    size: u64,
    align: u64
}

Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).

Adding flags essentially makes it a four-elements tuples, just different encoding method.

Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:

  1. allocate MMIO address below 4G for 32-bit BARs,
  2. allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

By fallback you mean using the 32-bit address space when there is no 64-bit one?
Yes, fallback means using 32-bit addresses for 64-bit BARs.

Copy link

@sameo sameo Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:

MmioAddress {
    base: Option<u64>,
    size: u64,
    align: u64
}

Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).

Adding flags essentially makes it a four-elements tuples, just different encoding method.

That's correct. I think the flags option gives us more flexibility on how we could describe constraints, though.
I'm trying to see what would be the more convenient way of using the same resource definition for both requested and reserved resources, for simplicity sake.
Using a definition like:

MmioAddress {
    base: Option<u64>,
    flags: u64,
    size: u64,
    align: u64
}

seems, at least to me, a more natural way of expressing both a requested MMIO resource and a reserved one.

Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:

  1. allocate MMIO address below 4G for 32-bit BARs,
  2. allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.

The difference is whether to support fallback case, which is the current firecracker design.

By fallback you mean using the 32-bit address space when there is no 64-bit one?
Yes, fallback means using 32-bit addresses for 64-bit BARs.

I believe we should keep that as an allocator strategy. What the selected VMM allocator does with the following MMIO request:

let mut flags = 1u64 << MMIO_64;
let requested_region = MMIOAddress {
         base: None,
         flags,
         size: 0x10000,
         align: 0x1000,
};

is up to the allocator itself. It could very well decide to return a 32-bit address from the 32-bit hole.

@chao-p
Copy link
Member

chao-p commented Sep 10, 2019

needs

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

Yes, definitely. For code sharing across VMMs you can move the code to another crate. I don't have strong opinion that we have to do like that way since it's already under the build-time control of 'kvm_irq'. But what I'm thinking is once other hypervisor(like Hyper-V) is added then it will add its implementation to vm-device as well.

src/resources.rs Outdated
use std::{u16, u32, u64};

/// Enumeration describing a device's resource requirements.
pub enum DeviceResourceRequest {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would remove the Device prefix here. It's generic enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure:)

src/resources.rs Outdated
/// Request for a Memory Mapped IO address range.
MmioAddress {
/// Reserve the preallocated range [`fixed`, `fixed` + `size`] if it's not u64::MAX.
fixed: u64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of fixed and define it as an equivalent of min=max?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion:)

src/resources.rs Outdated
align: u64,
/// Size for the allocated address range.
size: u64,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a device ask for e.g. a BAR from the 32-bit hole? Implictly with max set to u32::MAX?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the min/max fields are designed to support 32-bit/64-bit PCI bars. For 32-bit PCI BARs, the max field should be set as u32::MAX. And the vm-allocator should respect all these constraints when allocating resources.

src/resources.rs Outdated
GenericIrq {
/// Number of Irqs to allocate.
size: u32,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well, I would reduce all Irq requests to:

Irq {
      type: IrqType,
      size: u32,
}

@jiangliu
Copy link
Member Author

needs

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

Yes, definitely. For code sharing across VMMs you can move the code to another crate. I don't have strong opinion that we have to do like that way since it's already under the build-time control of 'kvm_irq'. But what I'm thinking is once other hypervisor(like Hyper-V) is added then it will add its implementation to vm-device as well.

The idea is to let the vm-device provide both interfaces and some default implementations.
Other hypervisor may provide its own implementation, which may or may not be merged into the vm-device crate.

@chao-p
Copy link
Member

chao-p commented Sep 10, 2019

needs

Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs

It depends.
The irq manager implementation is generic enough, it just mimics hardware behaviors.
If we move the irq manager implementation into vmm, it means each vmm needs to repeat the same code.

Yes, definitely. For code sharing across VMMs you can move the code to another crate. I don't have strong opinion that we have to do like that way since it's already under the build-time control of 'kvm_irq'. But what I'm thinking is once other hypervisor(like Hyper-V) is added then it will add its implementation to vm-device as well.

The idea is to let the vm-device provide both interfaces and some default implementations.
Other hypervisor may provide its own implementation, which may or may not be merged into the vm-device crate.

sure.

jiangliu and others added 7 commits September 10, 2019 17:20
Implement infrastructure to manage interrupt sources based on Linux KVM
kernel module.

Signed-off-by: Liu Jiang <[email protected]>
Signed-off-by: Bin Zha <[email protected]>
Implement InterruptSourceGroup trait to manage x86 legacy interruts.
On x86 platforms, pin-based device interrupts connecting to the master
PIC, the slave PIC and IOAPICs are named as legacy interrupts. For
legacy interrupts, the interrupt routing logic are manged by the
PICs/IOAPICs and the interrupt group logic only takes responsibility
to enable/disable the interrupts.

Signed-off-by: Liu Jiang <[email protected]>
Signed-off-by: Bin Zha <[email protected]>
Introduce generic mechanism to support message signalled interrupts,
including PCI MSI/PCI MSIx/Generic interrupts.

Signed-off-by: Liu Jiang <[email protected]>
Signed-off-by: Bin Zha <[email protected]>
Current hashval algorithm to generate key for kvm_irq_routing_entry will
generate duplicated keys for legacy IRQ connecting to PIC and IOAPIC.
So refine the algorithm to generate unique keys for every interrupt
source.

Signed-off-by: Zha Bin <[email protected]>
With some kvm version, setting irq_routing for non-existing legaccy
IRQs may cause system crash. So limit the number to available legacy
interrupts.

Signed-off-by: 守情 <[email protected]>
Signed-off-by: 守情 <[email protected]>
@jiangliu
Copy link
Member Author

@jiangliu I think it would be easier for reviewers to split this PR into 2 PRs:

  1. The interrupt manager/subsystem itself
  2. The device resources and traits definitions

I think both can be reviewed independently, which is a good sign about your overall PR design quality ;-)

Seems most discussion are resource related, so have split the PR into two as you suggested.

@sameo
Copy link

sameo commented Oct 29, 2019

This PR is replaced by #9 and #11

@sameo sameo closed this Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants