-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interrupt subsystem to vm-device #8
Conversation
Switch to rust 2018 edition and turn on deny(missing_docs). Signed-off-by: Liu Jiang <[email protected]>
Introduce traits InterruptManager and InterruptSourceGroup to manage interrupt sources for virtual devices. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>
533c85e
to
4fe1c9e
Compare
src/resources.rs
Outdated
use std::{u16, u32, u64}; | ||
|
||
/// Enumeration describing a device's resource requirements. | ||
pub enum DeviceResourceRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found there is a DeviceResources
in previous commit, which is used to describe a specific device resource request. Is DeviceResourceRequest
global resources of a VM, that is to say, it acts the same as vm-allocator crate before? If not, what's the difference between DeviceResourcesand
DeviceResourceRequest` ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vmm asks each device for resource it needs, and the devices returns an array of DeviceResourceRequest to describe resource type and allocation constraints.
Then the VMM passes an array of allocated resource to the device, which is type of DeviceResources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'm thinking whether we could merge DeviceResourceRequest
and DeviceResources
into one type? Use something like base: Option<>
to make a request of non-specified and fill it to pass the allocation to device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The have different roles. The DeviceResourceRequest focuses on constraints, and DeviceResource focuses on resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangliu I think it would be easier for reviewers to split this PR into 2 PRs:
- The interrupt manager/subsystem itself
- The device resources and traits definitions
I think both can be reviewed independently, which is a good sign about your overall PR design quality ;-)
src/resources.rs
Outdated
/// PCI MSIx IRQ numbers. | ||
PciMsixIrq { base: u32, size: u32 }, | ||
/// Generic MSI IRQ numbers. | ||
GenericIrq { base: u32, size: u32 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need 3 enum types? Wouldn't:
pub enum MsiIrqType {
PciMsi,
PciMsix,
GenericMsi,
}
pub enum DeviceResourceEntry {
/// IO Port address range.
PioAddressRange { base: u16, size: u16 },
/// Memory Mapped IO address range.
MmioAddressRange { base: u64, size: u64 },
/// Legacy IRQ number.
LegacyIrq(u32),
/// MSI IRQ
MsiIrq(type: MsiIrqType, base: u32, size: u32),
/// Network Interface Card MAC address.
NicMacAddresss(String),
/// KVM memslot index.
KvmMemSlot(u32),
}
be simpler to handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even a MsiIrqType is needed at all?Depends on where we will use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even a MsiIrqType is needed at all?Depends on where we will use it.
A PCI device may support both PCI MSI and MSIx, and the guest kernel choose which one to use. So we need a way to distinguish different cases. If we assume only one type of MSI interrupt is supported by any device, we could remove the MSI type info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs
I agree the kvm_irq
implementation is orthogonal to the traits definitions, but I also think that this code could be shared across VMM implementations through the vm-device crate, as a default KVM implementation of the interrupt traits. Kind of similar to what we do with the mmap
implemenation of the memory model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can (and probably should when considering device assignment) support multiple MSI types. From device backend side, it knows exactly which MSI type it exposed to guest so why it need the info here. What I'm seeing is this type is from what vm-device returned to VMM in ResourceRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I believe all the source code under interrupt/kvm_irq should be part of VMM code instead of vm-device, in vm-device we don't need it at all. The only thing we do need is the trait definitions in interrupt/mod.rs
src/resources.rs
Outdated
/// Get the legacy interrupt number(IRQ). | ||
pub fn get_legacy_irq(&self) -> Option<u32> { | ||
for entry in self.0.iter().as_ref() { | ||
if let DeviceResourceEntry::LegacyIrq(base) = entry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here (and below several routines) you seem to assume there is only one irq entry existed, which if it is true, then you probably need check that in append() above.
It depends. |
src/resources.rs
Outdated
/// Alignment for the allocated address. | ||
align: u64, | ||
/// Size for the allocated address range. | ||
size: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use below definition to satisfy both fixed/unfixed and single/range address?
MmioAddress {
base: Option<u64>,
size: u64,
align: u64
}
Same for PIO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the semantics the vm-allocator plans to support. Current definition follows the linux kernel behavior. If we decide to simplify the vm-allocator semantics, the definition may be simplified too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something. Also, this could be used for both the device request for resources and the allocated device resources as well. It simplifies a device resource description down to one enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it can't allocate address range constraints for 32-bit PCI BAR. Or am I missing some points?
A tuple (min, max, size, align) should satisfy our requirements, but not sure whether it could be reduced as three-ele tuple (base, size, align).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.
FWIW, I think having a 32-bit base
field implicitly means we want a 32-bit region. Same as the max
field being set to u32:MAX
in your current proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.
FWIW, I think having a 32-bit
base
field implicitly means we want a 32-bit region. Same as themax
field being set tou32:MAX
in your current proposal.
With the above proposal we would not be able to express a request for a region that would cross the 32-bit limit, but I don't think this is something that exists today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.
FWIW, I think having a 32-bit
base
field implicitly means we want a 32-bit region. Same as themax
field being set tou32:MAX
in your current proposal.
I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.
Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:
- allocate MMIO address below 4G for 32-bit BARs,
- allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.
The difference is whether to support fallback
case, which is the current firecracker design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.
FWIW, I think having a 32-bit
base
field implicitly means we want a 32-bit region. Same as themax
field being set tou32:MAX
in your current proposal.I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.
With cloud-hypervisor
we had to support directly assigned devices that had both 32-bit and 64-bit BARs. Initially we were planning to actually only have 64-bit BARs but that does not work when you have more than 3 BARs (Due to the PCI config space limits). The lessons we learned from that is that:
- We must put 32-bit BARs below 4G, on the 32-bit hole.
- Assuming the 32-bit hole will be spacious enough for dynamic environments (with hotplugged and VFIO devices) and can fit all devices allocations is risky.
So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:
MmioAddress {
base: Option<u64>,
size: u64,
align: u64
}
Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).
Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:
- allocate MMIO address below 4G for 32-bit BARs,
- allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.
The difference is whether to support
fallback
case, which is the current firecracker design.
By fallback you mean using the 32-bit address space when there is no 64-bit one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangliu I think @chao-p proposal would allow for representing all cases that we want to support, regardless of the vm-allocator semantics, unless I'm missing something.
FWIW, I think having a 32-bit
base
field implicitly means we want a 32-bit region. Same as themax
field being set tou32:MAX
in your current proposal.I got the point, you suggest to allocate MMIO address below 4G for 32-bit BAR and MMIO address above 4G for 64-bit BARs. That means we need to maintain at least two MMIO address pools.
With
cloud-hypervisor
we had to support directly assigned devices that had both 32-bit and 64-bit BARs. Initially we were planning to actually only have 64-bit BARs but that does not work when you have more than 3 BARs (Due to the PCI config space limits). The lessons we learned from that is that:
- We must put 32-bit BARs below 4G, on the 32-bit hole.
- Assuming the 32-bit hole will be spacious enough for dynamic environments (with hotplugged and VFIO devices) and can fit all devices allocations is risky.
So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:
MmioAddress { base: Option<u64>, size: u64, align: u64 }Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).
Adding flags essentially makes it a four-elements tuples, just different encoding method.
Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:
- allocate MMIO address below 4G for 32-bit BARs,
- allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.
The difference is whether to support
fallback
case, which is the current firecracker design.By fallback you mean using the 32-bit address space when there is no 64-bit one?
Yes, fallback means using 32-bit addresses for 64-bit BARs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we do maintain 2 separate MMIO pools, yes. While I'm writing this, I realize the following proposed definition by @chao-p:
MmioAddress { base: Option<u64>, size: u64, align: u64 }Would not be able to express the "Give me any 32-bit MMIO region" but only "Give me a 32-bit MMIO region starting at 0xf00ba4". So I think it should expanded to include e.g. a bitflags to describe further requirements (e.g. the 32-bit one).
Adding flags essentially makes it a four-elements tuples, just different encoding method.
That's correct. I think the flags option gives us more flexibility on how we could describe constraints, though.
I'm trying to see what would be the more convenient way of using the same resource definition for both requested and reserved resources, for simplicity sake.
Using a definition like:
MmioAddress {
base: Option<u64>,
flags: u64,
size: u64,
align: u64
}
seems, at least to me, a more natural way of expressing both a requested MMIO resource and a reserved one.
Currently firecracker maintains only one MMIO address pools below 4G, which serves both 32-bit and 64-bit BARs.
So I'm proposing a policy as:
- allocate MMIO address below 4G for 32-bit BARs,
- allocate MMIO address above 4G for 64-bit BARs, and fallback to MMIO address below 4G.
The difference is whether to support
fallback
case, which is the current firecracker design.By fallback you mean using the 32-bit address space when there is no 64-bit one?
Yes, fallback means using 32-bit addresses for 64-bit BARs.
I believe we should keep that as an allocator strategy. What the selected VMM allocator does with the following MMIO request:
let mut flags = 1u64 << MMIO_64;
let requested_region = MMIOAddress {
base: None,
flags,
size: 0x10000,
align: 0x1000,
};
is up to the allocator itself. It could very well decide to return a 32-bit address from the 32-bit hole.
Yes, definitely. For code sharing across VMMs you can move the code to another crate. I don't have strong opinion that we have to do like that way since it's already under the build-time control of 'kvm_irq'. But what I'm thinking is once other hypervisor(like Hyper-V) is added then it will add its implementation to vm-device as well. |
src/resources.rs
Outdated
use std::{u16, u32, u64}; | ||
|
||
/// Enumeration describing a device's resource requirements. | ||
pub enum DeviceResourceRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would remove the Device
prefix here. It's generic enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure:)
src/resources.rs
Outdated
/// Request for a Memory Mapped IO address range. | ||
MmioAddress { | ||
/// Reserve the preallocated range [`fixed`, `fixed` + `size`] if it's not u64::MAX. | ||
fixed: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of fixed
and define it as an equivalent of min
=max
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion:)
src/resources.rs
Outdated
align: u64, | ||
/// Size for the allocated address range. | ||
size: u64, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a device ask for e.g. a BAR from the 32-bit hole? Implictly with max
set to u32::MAX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the min/max fields are designed to support 32-bit/64-bit PCI bars. For 32-bit PCI BARs, the max field should be set as u32::MAX. And the vm-allocator should respect all these constraints when allocating resources.
src/resources.rs
Outdated
GenericIrq { | ||
/// Number of Irqs to allocate. | ||
size: u32, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well, I would reduce all Irq requests to:
Irq {
type: IrqType,
size: u32,
}
The idea is to let the vm-device provide both interfaces and some default implementations. |
sure. |
Implement infrastructure to manage interrupt sources based on Linux KVM kernel module. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>
Implement InterruptSourceGroup trait to manage x86 legacy interruts. On x86 platforms, pin-based device interrupts connecting to the master PIC, the slave PIC and IOAPICs are named as legacy interrupts. For legacy interrupts, the interrupt routing logic are manged by the PICs/IOAPICs and the interrupt group logic only takes responsibility to enable/disable the interrupts. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>
Introduce generic mechanism to support message signalled interrupts, including PCI MSI/PCI MSIx/Generic interrupts. Signed-off-by: Liu Jiang <[email protected]> Signed-off-by: Bin Zha <[email protected]>
Current hashval algorithm to generate key for kvm_irq_routing_entry will generate duplicated keys for legacy IRQ connecting to PIC and IOAPIC. So refine the algorithm to generate unique keys for every interrupt source. Signed-off-by: Zha Bin <[email protected]>
With some kvm version, setting irq_routing for non-existing legaccy IRQs may cause system crash. So limit the number to available legacy interrupts. Signed-off-by: 守情 <[email protected]>
Signed-off-by: 守情 <[email protected]>
Signed-off-by: Liu Jiang <[email protected]>
Seems most discussion are resource related, so have split the PR into two as you suggested. |
This PR adds two subsystem to the vm-device crate: