...or "How I learned to shut my x270 laptop and not worry about my VMs."
This is an implementation of vmmci(4) for
Linux using a customized version of the virtio_pci
driver from the
mainline kernel. It currently supports the following:
-
Clean Shutdowns on Request When requested by
vmctl(8)
...you can safely usevmctl stop <you linux guest>
and it'll nicely stop services and sync disks! -
System Time Synchronization When the host
vmd(8)
emulation of the hardware clock detects a clock drift (most likely due to the host being suspended/resumed), it fires aSYNCRTC
message that the Linuxvmmci
driver responds to by synchronizing system time to the hardware clock time. (This currently only happens during certain host events like resuming from a suspended state.) -
Tracking Clock Drift At regular intervals (currently 20s),
vmmci
will measure current clock drift, recording the current drift amount in seconds and nanoseconds parts readable viasysctl vmmci
NOTE: if you're here to deal with constant, excessive clock drift, see the FAQ!
Above is a screenshot of the clock sync in practice. Tmux pane 0
is
my instance of vmd(8)
running in the foreground with verbose
logging. The other panes:
- Alpine 3.8.4 (virt) with kernel 4.14.104-0-virt
- Debian Buster (9.8) with kernel 4.9.0-8-amd64 (yeah, something
is jacked up with dmesg's time...but it IS correct in
journalctl(1)
and when checkingtimedatectl(1)
) - Ubuntu 18.04 with my custom kernel 4.20.13-obsd+
Take note of the rtc_fire1
log events from vmd(8)
. That's where my
laptop comes out of hibernation and the virtual rtc detects a drift
and sends sync requests to the guests. Each Linux guest receives the
request, performs the clock step, and ack's.
Before you dive in, a few things to note:
-
I test and develop using OpenBSD snapshots, so relatively in sync with -current. (This should work with OpenBSD 6.7 and later.)
-
I lean heavily on the simplification that OpenBSD virtualization guests are single CPU currently.
-
This currenly won't solve larger clock issues, such as major drift.
-
I primarily focus on supporting the newest long-term support kernels picked up by major distros, which means Linux 5.4 at the moment.
-
I focus my testing on Alpine Linux guests using their
-virt
releases since it's simple to install and manage without a lot of ancillary stuff. Plus, I personally like Alpine.
This Linux VMMCI currently comes in two parts:
virtio_pci_obsd.ko
-- handles the quirks of getting Linux's virtio pci framework to properly work with the VMM Control Interface device fromvmd(8)
virtio_vmmci.ko
-- virtio device driver that replicates the behavior of OpenBSD'svmmci(4)
driver
You will need both modules installed!
Assuming you've got a recent Linux distro running as a guest already under OpenBSD, it shouldn't be more than a few minutes to get things up and running.
Install the tools required to build kernel modules using your package manager or whatever you normally use to install stuff.
For Alpine systems running the -virt
flavored kernel:
# apk add gcc make linux-virt-dev
Basically you need your kernel headers and some GCC tooling.
This should be easy and expose issues with your lack of prerequisites or an incompatability with your kernel version:
$ make
A common source of compiler warnings are from variations in kernel versions. Please share your kernel version and the compiler output if you have issues!
As root, simply run:
# make install
You'll probably see some SSL errors and complaints about missing key files. This is expected as you're building an out-of-tree kernel module that isn't being signed. If you'd like to sign the module, you're on your own at the moment, but maybe read the Linux kernel documentation on it here:
At this point, you'll have 2 new kernel modules. You should see them if you run:
$ ls -l /lib/modules/$(uname -r)/extra
total 36
-rw-r--r-- 1 root root 15272 May 9 20:42 virtio_pci_obsd.ko
-rw-r--r-- 1 root root 19872 May 9 20:42 virtio_vmmci.ko
This also should be easy now since their properly installed. Simply run:
# modprobe virtio_vmmci
It should load both the virtio_vmmci.ko
and virtio_pci_obsd.ko
modules. They'll be visible when running lsmod(8)
, but you won't see
a "depends on" entry due to it being a "soft" dependency.
After you load virtio_pci_obsd.ko
you should see your system match
and enable the vmmci PCI device. Check dmesg(1)
and you should see
something like:
[ 825.819945] virtio_pci_obsd: loading out-of-tree module taints kernel.
[ 825.819945] virtio_pci_obsd: module verification failed: signature and/or required key missing - tainting kernel
[ 825.819945] virtio-pci-obsd 0000:00:05.0: runtime IRQ mapping not provided by arch
[ 825.819945] virtio_pci_obsd_match: matching 0x0777
[ 825.819945] virtio_pci_obsd_match: found OpenBSD device
[ 825.819945] virtio-pci-obsd 0000:00:05.0: enabling bus mastering
If you check with lspci(8)
in verbose mode (lspci -v
) you should
see the device and the fact it's using our virti_pci_obsd
driver:
00:05.0 Communication controller: Device 0b5d:0777
Subsystem: Device 0b5d:ffff
Flags: bus master, fast devsel, latency 0, IRQ 9
I/O ports at 5000 [size=4K]
Kernel driver in use: virtio-pci-obsd
When you load virtio_vmmci.ko
, you should see a confirmation the
module is loaded:
[ 256.030878] virtio_vmmci: started VMM Control Interface driver
You can enable debug mode either by passing a debug=1
argument when
loading the virtio_vmmci.ko
module or toggle it afterwards by
writing either a 0
(off) or 1
/any positive integer (on) to
/sys/modules/virtio_vmmci/parameters/debug
as the root user. When
debug mode is on, you'll get extra dmesg noise like:
[17769.012388] virtio_vmmci: [vmmci_validate] not implemented
[17769.012388] virtio_vmmci: [vmmci_probe] initializing vmmci device
[17769.012388] virtio_vmmci: [vmmci_probe] ...found feature TIMESYNC
[17769.012388] virtio_vmmci: [vmmci_probe] ...found feature ACK
[17769.012388] virtio_vmmci: [vmmci_probe] ...found feature SYNCRTC
[17769.012388] virtio_vmmci: started VMM Control Interface driver
[17769.034540] virtio_vmmci: [clock_work_func] starting clock synchronization
[17769.034864] virtio_vmmci: [clock_work_func] guest clock: 1550959642.629898000, host clock: 1550959642.638556712
[17769.034867] virtio_vmmci: [clock_work_func] current time delta: -1.991341288
[17769.034870] virtio_vmmci: [clock_work_func] clock synchronization routine finished
Lastly, check the sysctl tables. The driver registers 2 particular values that contain the seconds and nanoseconds portion of the last measured drift amount:
you@guest:~/virtio_vmmci$ sudo sysctl vmmci
vmmci.drift_nsec = 199647574
vmmci.drift_sec = 1
In the above example, the total drift is 1.199647574 seconds
.
In the future I may expose the last measured time as well
This is pretty simple in modern distros that use
/etc/modules-load.d
. As root, create a file
/etc/modules-load.d/virtio_vmmci.conf
with the contents:
virtio_vmmci
At boot, you should see the modules loaded automatically.
There are a few things you can do to validate your installation.
You can easily test the clock synchronization by suspending your
OpenBSD host by triggering zzz
manually or by something like closing
your laptop lid. Wait at least 10 seconds or so and resume your
OpenBSD system. In the Linux guest, your dmesg(1)
output will tell
you (in less than 30 seconds) that it's detected a clock drift and
it's sync'ing the clock:
[15670.027879] virtio_vmmci: [clock_work_func] current time delta: 91.482370612
[15670.027879] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[15670.027879] virtio_vmmci: [clock_work_func] clock synchronization routine finished
If you check date
or timedatectl
on the Linux guest you should see
the system time is very close to our host time.
How can we test a clean shutdown? It's not too hard, but it might not work the same between distros and versions. Here's what I've done on Alpine 3.11.6.
Assuming your vm is up and running:
- Use
tmux(1)
or another means of getting 2 terminal sessions going at once. - In one session,
vmctl console <vm name or id>
to connect to the VM over the serial console. (This obviously assumes your guest is configured to work that way.) - In another session, issue
vmctl stop <vm name or id>
. - Back in the serial console session, you should see your init
system...probably
systemd
...start running through the shutdown process.
There may be some variations. The Linux vmmci driver calls a kernel helper function that handles orchestrating the shutdown via userspace. (The question of how to shutdown a Linux system from kernelspace is quite fascinating to explore.)
Some questions that people...mainly myself...have had...
My initial release would constantly adjust the guest clock when detecting drift. I since removed the functionality and will not re-add it no matter how much it's requested.
Some reasons I removed it:
- It's a bandaid on a bigger issue, not a real solution.
- You can apply a bandaid already using something like
hwclock -us
, but since it usessettimeofday(2)
it may not trigger pending timers properly!
Constant, excessive drift shouldn't be the norm. Using refined-jiffies will cause this.
If you or a loved one experience excessive clock drift in your Linux guests under OpenBSD's vmm(4)/vmd(8) hypervisor framework, please try the following:
- Build and install my other Linux kernel: vmm-clock
- Use OpenBSD-current as of 1 July 2020 or so when my vmd(8) patch[6] was merged into the tree
You will need BOTH...vmm-clock will crash your guest if you don't have a vmd(8) instance with the stability improvements.
This isn't using the userland settimeofday(2)
system call and
instead using a particular kernel function (do_settimeofday64
[3])
that appears to be pretty analagous to OpenBSD's kernel's
tc_setclock
function[4] in that it steps the system clock while
triggering any alarms or timeouts that would fire.
Looking at how VirtualBox handles this with their userland guest
additions services, they look for large clock drifts where "large"
is currently > 30 minutes. If it's large, it just uses
settimeofday(2)
. Otherwise, it tries to use something like
adjtimex(2)
to accelerate the clock up to the correct time. (This is
something I may consider for vmmci after some more usage/testing.)
See their source for VBoxServiceTimeSync.cpp
[5].
Maybe for small clock disturbances/drifts, but it's not ideal for major stepping and only solves the clock problem.
There are two reasons I'd consider using virtio_vmmci
either in
addition to or in place of relying on an NTP daemon:
-
Not every guest has network access. This precludes NTP as an option. Even if the guest has limited network access, it still needs access to an NTP server, ideally multiple. This isn't always the case.
-
Large clock drifts like when you suspend your laptop for an evening make most NTP daemons sad. I've never seen an NTP daemon that is cool with just jumping the system time ahead (i.e. stepping) like that. Some require special config to even do. Yes,
ntpd(8)
supports a-s
flag to do an actual set of the time and not just an adjustment, but even as the man page says it's for startup. (Useful for embedded, clock-less systems like a Raspberry Pi.)
A lot of modern Linux distros install and enable an NTP daemon by default these days. That's fine. But don't forget vmmci gives you clean shutdowns as well as properly stepping the clock after a long suspend/hibernation!
Few reasons, but for more background see my email to [email protected]: https://marc.info/?t=155102953000002
In short:
- OpenBSD purposely uses self-asigned PCI and Virtio device identifiers to "hide" the VMM Control Interface device
- Linux's virtio pci code is a LOT more complex and is trying to handle a variety of virtio devices...but can't handle a particular quirk with how the VMM Control Interface deals with config register i/o.
See the issues page for my ideas on future enhancements. Feel free to add some yourself, but keep in mind this is:
- Not my job...it's a hobby
- It's for my personal use first and foremost
- My current job is in software but has nothing to do with kernels, virtualization, etc. so this is truly an after-hours thing.
-
Thanks to the OpenBSD
vmm(4)
/vmd(8)
hackers...especially those that put together OpenBSD'svmmci(4)
driver which acted as my reference point. -
The bootlin cross-referencer because holy hell is that thing 10x more useful than poking around Torvald's mirror of the official Linux Git repo.
-
This page from "The kernel development community" was very helpful in figuring out how to schedule "deferred work" in the kernel: https://linux-kernel-labs.github.io/master/labs/deferred_work.html
-
The
virtio_balloon.c
driver in the Linux kernel tree is a relatively simple virtio example to understand Linux virtio drivers. -
The wireguard kernel module source tree for showing how to properly build out of tree modules: https://git.zx2c4.com/wireguard-linux-compat/tree/src
-
Folks that have helped test on different distros with different kernel versions :-)
GitHub might not render these...but believe me they're here :-)
[1] Linux Kernel documentation on generating a private key for signing kernel modules: https://www.kernel.org/doc/html/v4.15/admin-guide/module-signing.html#generating-signing-keys
[2] See this write-up on time-sync in vm's: http://archive.is/ndiy3
[3] See the time/timekeeping.c
source file:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timekeeping.c?h=v4.20#n1220
[4] OpenBSD's sys/kern/kern_tc.c
:
https://github.com/openbsd/src/blob/e12a049bd4bbd1e8315c373a739e08972ed6dd1d/sys/kern/kern_tc.c#L382
[5] VirtualBox's VBoxServiceTimeSync.cpp
:
https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Additions/common/VBoxService/VBoxServiceTimeSync.cpp?rev=76553#L683
[6] My vmd(8) stability fixes: https://github.com/openbsd/src/commit/08fd0ce3179b426bc00beaee67fffdfa71997830