Skip to content

Commit

Permalink
test: add helper functions for debugging
Browse files Browse the repository at this point in the history
Some utility functions to help when interactively debugging tests,
including one to run gdb on the microvm.

Signed-off-by: Pablo Barbáchano <[email protected]>
  • Loading branch information
pb8o committed Sep 15, 2023
1 parent 6cb45ce commit 7321629
Show file tree
Hide file tree
Showing 4 changed files with 279 additions and 19 deletions.
119 changes: 101 additions & 18 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,31 +192,23 @@ source tree. This directory is bind-mounted in the container and used as a
local image cache.

`Q5:`
*Is there a way to speed up integration tests execution time?*
`A5:`
You can speed up tests execution time with any of these:

`Q6:`
*How can I get live logger output from the tests?*
`A6:`
`A5:`
Accessing **pytest.ini** will allow you to modify logger settings.

1. Run the tests from inside the container and set the environment variable
`KEEP_TEST_SESSION` to a non-empty value.

Each **Testrun** begins by building the firecracker and unit tests binaries,
and ends by deleting all the built artifacts.
If you run the tests [from inside the container](#running), you can prevent
the binaries from being deleted exporting the `KEEP_TEST_SESSION` variable.
This way, all the following **Testrun** will be significantly faster as they
will not need to rebuild everything.
If any Rust source file is changed, the build is done incrementally.
`Q6:`
*Is there a way to speed up integration tests execution time?*

`A6:`
You can narrow down the test selection as described in the **Running**
section, or in the **Troubleshooting Tests** section. For example:

1. Pass the `-k substring` option to Pytest to only run a subset of tests by
specifying a part of their name.

1. Only run the tests contained in a file or directory, as specified in the
**Running** section.
1. Only run the tests contained in a file or directory.


## Implementation Goals

Expand All @@ -241,7 +233,6 @@ Pytest was chosen because:

### Features

- A fixture for interacting with microvms via SSH.
- Use the Firecracker Open API spec to populate Microvm API resource URLs.
- Do the testrun in a container for better insulation.
- Event-based monitoring of microvm socket file creation to avoid while spins.
Expand All @@ -261,3 +252,95 @@ Pytest was chosen because:
## Further Reading

Contributing to this testing system requires a dive deep on `pytest`.

## Troubleshooting tests

### How to select tests

When troubleshooting tests, it is important to only narrow down the ones that
are of interest. `pytest` offers several features to do that:

#### single file

```sh
./tools/devtool -y test -- integration_tests/performance/test_boottime.py
```

#### single test

```sh
./tools/devtool -y test -- integration_tests/performance/test_boottime.py::test_boottime
```

#### single test + parameter(s)

Use the `-k` parameter to match part of the test (including the parameters!):

```sh
./tools/devtool -y test -- -k 1024 integration_tests/performance/test_boottime.py::test_boottime
```

#### --last-failed

One can use the `--last-failed` parameter to only run the tests that failed from
the previous run. Useful when several tests fail after making large changes.

### Run tests from within the container

To avoid having to enter/exit Docker every test run, you can run the tests
directly within a Docker session:

```sh
./tools/devtool -y shell --privileged
./tools/test.sh integration_tests/functional/test_api.py
```

### How to use the Python debugger (pdb) for debugging

Just append `--pdb`, and when a test fails it will drop you in pdb, where you
can examine local variables and the stack, and can use the normal Python REPL.

```
./tools/devtool -y test -- -k 1024 integration_tests/performance/test_boottime.py::test_boottime --pdb
```

### How to use ipython's ipdb instead of pdb

```sh
./tools/devtool -y shell --privileged
pip3 install ipython
export PYTEST_ADDOPTS=--pdbcls=IPython.terminal.debugger:TerminalPdb
./tools/test.sh -k 1024 integration_tests/performance/test_boottime.py::test_boottime
```

### How to debug tests

```sh
./tools/devtool -y test_debug -k 1024 integration_tests/performance/test_boottime.py::test_boottime
```

### How to reproduce intermittent (aka flaky) tests

Just run the test in a loop, and make it drop you into pdb when it fails.

```sh
while true; do
./tools/devtool -y test -- integration_tests/functional/test_balloon.py::test_deflate_on_oom -k False --pdb
done
```

### How to run tests in parallel with `-n`

We can run the tests in parallel via `pytest-xdist`. Not all tests can run in
parallel (the ones in `build` and `performance` are not supposed to run in
parallel).

By default, the tests run sequentially. One can use the `-n` to control the
parallelism. Just `-n` will run as many workers as CPUs, which may be too many.
As a rough heuristic, use half the available CPUs. I use -n4 for my 8 CPU
(HT-enabled) laptop. In metals 8 is a good number; more than that just gives
diminishing returns.

```sh
./tools/devtool -y test -- integration_tests/functional -n$(expr $(nproc) / 2) --dist worksteal
```
8 changes: 7 additions & 1 deletion tests/framework/microvm.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
from framework.defs import FC_PID_FILE_NAME, MAX_API_CALL_DURATION_MS
from framework.http_api import Api
from framework.jailer import JailerContext
from framework.microvm_helpers import MicrovmHelpers
from framework.properties import global_props
from host_tools.memory import MemoryMonitor

Expand Down Expand Up @@ -169,6 +170,7 @@ def __init__(
self.rootfs_file = None
self.ssh_key = None
self.initrd_file = None
self.boot_args = None

# The binaries this microvm will use to start.
if fc_binary_path is None:
Expand Down Expand Up @@ -221,6 +223,8 @@ def __init__(
# MMDS content from file
self.metadata_file = None

self.help = MicrovmHelpers(self)

def __repr__(self):
return f"<Microvm id={self.id}>"

Expand Down Expand Up @@ -576,9 +580,11 @@ def basic_config(
if self.memory_monitor:
self.memory_monitor.start()

if boot_args is not None:
self.boot_args = boot_args
boot_source_args = {
"kernel_image_path": self.create_jailed_resource(self.kernel_file),
"boot_args": boot_args,
"boot_args": self.boot_args,
}

if use_initrd and self.initrd_file is not None:
Expand Down
162 changes: 162 additions & 0 deletions tests/framework/microvm_helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

"""Microvm helper functions for interactive use"""

import os
import platform
import subprocess


class DockerInfo:
"""Class to extract information from the Docker environment"""

@property
def ip(self):
"""Return this container's IP address"""
return (
subprocess.check_output(
"ip -j address show eth0 |jq -r '.[].addr_info[].local'",
shell=True,
)
.decode("ascii")
.strip()
)

@property
def id(self):
"""Return this container's id"""
return platform.node()


DOCKER = DockerInfo()


class MicrovmHelpers:
"""Microvm helper functions for interactive use"""

def __init__(self, vm):
self.vm = vm

def print_log(self):
"""Print Firecracker's log"""
print(self.vm.log_data)

def resize_disk(self, disk, size: int = 2**30):
"""Resize a filesystem
The filesystem should be unmounted for this to work
"""
os.truncate(disk, size)
subprocess.check_output(["resize2fs", disk])

def gdbserver(self, port=2000):
"""Attach gdbserver to the FC process
See https://sourceware.org/gdb/current/onlinedocs/gdb.html/Remote-Debugging.html#Remote-Debugging
"""
comm = f"localhost:{port}"
subprocess.Popen(["gdbserver", "--attach", comm, str(self.vm.jailer_clone_pid)])
print(f"Connect gdb with:\n\tgdb --ex 'target remote {DOCKER.ip}:{port}'")

def lldbserver(self, port=2001):
"""Attach lldb-server to the FC process
See https://lldb.llvm.org/use/remote.html
TBD does not work. Fails with
error: attach failed: lost connection
"""
# Unlike gdbserver, lldb-server is not a separate package, but is part
# of lldb and it's about ~400MB to install, so we don't include it in
# the devctr
subprocess.run("apt update && apt install lldb", shell=True, check=True)
subprocess.Popen(["lldb-server", "p", "--listen", f"*:{port}", "--server"])
print(
f"Connect lldb with\n\tlldb -o 'platform select remote-linux' -o 'platform connect connect://{DOCKER.ip}:{port}' -o 'attach {self.vm.jailer_clone_pid}'"
)

def tmux_neww(self, cmd: str):
"""Open a window in the local tmux"""
return subprocess.run(["tmux", "neww", cmd], check=True)

def how_to_ssh(self):
"""Print how to SSH to the microvm
This may be useful for example to get a terminal
"""
ip = self.vm.iface["eth0"]["iface"].guest_ip
return f"ip netns exec {self.vm.jailer.netns} ssh -o StrictHostKeyChecking=no -i {self.vm.ssh_key} root@{ip}"

def tmux_ssh(self):
"""Open a tmux window with an SSH session to the VM"""
return self.tmux_neww(self.how_to_ssh())

def enable_console(self):
"""Helper method to attach a console, before the machine boots"""
if self.vm.api is not None:
raise RuntimeError(".spawn already called, too late to enable the console")
if self.vm.boot_args is None:
self.vm.boot_args = ""
self.vm.boot_args += "console=ttyS0 reboot=k panic=1"
self.vm.jailer.daemonize = False

def how_to_console(self):
"""Print how to connect to the VM console"""
return f"screen -dR {self.vm.screen_session}"

def tmux_console(self):
"""Open a tmux window with the console"""
return self.tmux_neww(self.how_to_console())

def how_to_docker(self):
"""How to get into this container from outside"""
return f"docker exec -it {DOCKER.id}"

def enable_ip_forwarding(self):
"""
Enables IP forwarding
TBD this only works for a single microvm. allow several microvms.
we need to make the veth network smaller and **allocate** them
accordingly
"""
netns = self.vm.jailer.netns
vethhost = "vethhost0"
vethhost_ip = "10.0.0.1"
veth_net = "10.0.0.0/255.255.255.0"
tap_net = "192.168.0.0/255.255.255.0"
tap_host_ip = self.vm.iface["eth0"]["iface"].host_ip

def run(cmd):
return subprocess.run(cmd, shell=True, check=True)

def run_in_netns(cmd):
return run(f"ip netns exec {netns} " + cmd)

# outside netns
# iptables -L -v -n
run(f"ip link add name {vethhost} type veth peer name vethvpn0 netns {netns}")
run(f"ip addr add {vethhost_ip}/24 dev {vethhost}")
run_in_netns("ip addr add 10.0.0.2/24 dev vethvpn0")
run(f"ip link set {vethhost} up")
run_in_netns("ip link set vethvpn0 up")

run("iptables -P FORWARD DROP")
# iptables -L FORWARD
# iptables -t nat -L
run(f"iptables -t nat -A POSTROUTING -s {veth_net} -o eth0 -j MASQUERADE")
run("iptables -A FORWARD -i eth0 -o vethhost0 -j ACCEPT")
run("iptables -A FORWARD -i vethhost0 -o eth0 -j ACCEPT")

# in the netns
run_in_netns(f"ip route add default via {vethhost_ip}")
# tap_ip = ipaddress.ip_network("192.168.0.1/30", False)
run_in_netns("iptables -A FORWARD -i tap0 -o vethvpn0 -j ACCEPT")
run_in_netns("iptables -A FORWARD -i vethvpn0 -o tap0 -j ACCEPT")
run_in_netns(
f"iptables -t nat -A POSTROUTING -s {tap_net} -o vethvpn0 -j MASQUERADE"
)

self.vm.ssh.run(f"ip route add default via {tap_host_ip}")
self.vm.ssh.run("echo nameserver 8.8.8.8 >/etc/resolv.conf")
9 changes: 9 additions & 0 deletions tools/devtool
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,11 @@ cmd_help() {
echo " -c, --cpuset-cpus cpulist Set a dedicated cpulist to be used by the tests."
echo " -m, --cpuset-mems memlist Set a dedicated memlist to be used by the tests."
echo ""

cat <<EOF
test_debug [-- [<pytest args>]]
Run tests in a debugging environment
EOF
}


Expand Down Expand Up @@ -689,6 +694,10 @@ cmd_sh() {
bash --norc -c "$*"
}

cmd_test_debug() {
cmd_sh "tmux new ./tools/test.sh --pdb $@"
}

# Auto-format all source code, to match the Firecracker requirements. For the
# moment, this is just a wrapper over `cargo fmt --all`
# Example: `devtool fmt`
Expand Down

0 comments on commit 7321629

Please sign in to comment.