Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A/B-ification of tests that can fail due to external influences #4149

Merged
merged 10 commits into from
Oct 23, 2023

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Oct 5, 2023

Changes

This PR introduces A/B-approaches to tests that can fail due to external factors that would otherwise block all PRs until they are resolved. Using the cargo audit test as an example, we want to differentiate between the following two cases:

  • A PR introduces a new dependency for which a rustsec advisory exists
  • A pre-existing dependency has a new rustsec advisory filed

We want to fail PR CI in the first scenario, but be able to deal with the second scenario out-of-band. A/B-Testing allows us to do exactly that.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link

codecov bot commented Oct 5, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6141c2c) 82.94% compared to head (e45a681) 82.92%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4149      +/-   ##
==========================================
- Coverage   82.94%   82.92%   -0.03%     
==========================================
  Files         221      221              
  Lines       28419    28419              
==========================================
- Hits        23572    23566       -6     
- Misses       4847     4853       +6     
Flag Coverage Δ
4.14-c7g.metal ?
4.14-m5d.metal ?
4.14-m6a.metal ?
4.14-m6g.metal 78.46% <ø> (ø)
4.14-m6i.metal ?
5.10-c7g.metal 81.39% <ø> (ø)
5.10-m5d.metal 82.96% <ø> (ø)
5.10-m6a.metal 82.19% <ø> (ø)
5.10-m6g.metal 81.39% <ø> (ø)
5.10-m6i.metal 82.95% <ø> (+<0.01%) ⬆️
6.1-c7g.metal 81.39% <ø> (ø)
6.1-m5d.metal 82.96% <ø> (+0.01%) ⬆️
6.1-m6a.metal 82.19% <ø> (ø)
6.1-m6g.metal 81.39% <ø> (ø)
6.1-m6i.metal 82.95% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat roypat force-pushed the the-big-ab-ification branch 7 times, most recently from 129afb8 to 6809cd6 Compare October 5, 2023 13:51
@roypat roypat marked this pull request as ready for review October 5, 2023 15:11
@roypat roypat force-pushed the the-big-ab-ification branch 3 times, most recently from 2abca38 to e9e9e80 Compare October 5, 2023 15:36
@roypat
Copy link
Contributor Author

roypat commented Oct 5, 2023

Example output from the new cargo audit test if someone adds a new dependency that has a rustsec advisory on it: https://buildkite.com/firecracker/firecracker-pr/builds/6549#018b007b-e507-42d7-9f2f-fbe79ec6ac80

@roypat roypat force-pushed the the-big-ab-ification branch from e9e9e80 to 79843ec Compare October 5, 2023 15:56
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Oct 5, 2023
@roypat roypat force-pushed the the-big-ab-ification branch 2 times, most recently from 9faf894 to 536f234 Compare October 9, 2023 13:16
Copy link
Contributor

@JonathanWoollett-Light JonathanWoollett-Light left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:


Change:

test: Propage ignore_return_code from utils.run_cmd to host_tools.cargo

to

test: Propagate `ignore_return_code` from `utils.run_cmd` to `host_tools.cargo`

Change:

test: A/B-ify test_cargo_audit

to:

test: A/B-ify `test_cargo_audit`

Same on all commits which includes code identifiers.

tests/host_tools/cargo_build.py Outdated Show resolved Hide resolved
tests/framework/ab_test.py Outdated Show resolved Hide resolved
tests/host_tools/cargo_build.py Outdated Show resolved Hide resolved
tests/framework/ab_test.py Outdated Show resolved Hide resolved
tests/integration_tests/security/test_vulnerabilities.py Outdated Show resolved Hide resolved
tests/integration_tests/security/test_vulnerabilities.py Outdated Show resolved Hide resolved
@roypat roypat force-pushed the the-big-ab-ification branch 3 times, most recently from 3f9b08a to 85c3061 Compare October 13, 2023 15:40
zulinx86
zulinx86 previously approved these changes Oct 16, 2023
Running a bash command as an A/B-Test means testing that the output of
the command is the same both before and after the PR.

Signed-off-by: Patrick Roy <[email protected]>
The test_cargo_audit test can fail due to new advisories being published
on crates that firecracker already consumes. We already get notified of
this because we do a nightly run that alerts us whenever any advisory
for one of our dependencies exists. There is no need to do the same on
PRs, as that means all development activities are blocked until the
advisory is resolved (which might be non-trivial). Thus use A/B-testing
in CI to only fail a PR if a newly introduced dependency has a
pre-existing advisory (in which case we do not want to introduce the
dependency). This is done by computing the set of
warnings/vulnerabilities that currently affect the repository, and
ensuring that there are no additions to this set (while subtractions,
for example due to a PR removing a vulnerable dependency, are allowed).

Signed-off-by: Patrick Roy <[email protected]>
Similar to test_cargo_audit, it can fail due to actions outside of our
control.

Signed-off-by: Patrick Roy <[email protected]>
The vulnerabilities A/B-Tests will need to compile firecracker from an
old revision, meaning we need to grab binaries from a different cargo
workspace.

Signed-off-by: Patrick Roy <[email protected]>
Needed for the vulnerability tests, as they need to run commands inside
of microvms.

Signed-off-by: Patrick Roy <[email protected]>
These tests can fail due to external factors (microcode updates, AMI
updates, etc), which would then block our PR CI until those get
resolved. By using A/B-testing for our PR CI we avoid this, and get
alerted to these changes out-of-band.

Since A/B-Testing needs microvms compiled from different revisions, we
need to change our fixture approach a bit. Instead of building microvms,
it now provides factory methods that can be consumed by the A/B-test
functions for building microvms from compiled firecracker binaries.
These factory methods can then be composed to make them perform
additional actions such as "restore from snapshot" or "make sure checker
script is there".

The condition that the A/B-Tests verify is "PR did not introduce a
vulnerability". This is different from the "Result of vulnerability test
did not change across PR" that might be more obviously associated with
A/B-testing. However, this latter approach would not allow us to fix
vulnerabilities (as it would block such PRs).

Signed-off-by: Patrick Roy <[email protected]>
This lint unilaterally forbids shadowing. However, this clashes with
pytest fixtures in a great many places: To use a fixture a test function
needs to have an argument that matches the fixture's name. However, if
the fixture is defined in the same file as its usage, then this will
trigger pylint's redfined-outer-name. Since this is a very common
pattern, disable this lint.

Signed-off-by: Patrick Roy <[email protected]>
Previously, test_ab.py was explicitly calling release.sh to build
firecracker binaries. This was done because originally the first version
of the script called out to `tools/devtool build`, to ensure maximal
compatibility with older versions of firecracker (which might no longer
compile in new dev containers). However, nowadays the entire ab_test.py
script is run inside of docker, meaning we no longer get this benefit,
so there is no difference between calling release.sh and using
get_firecracker_binaries(), so just use the git_ab_test_with_binaries
facilities introduced in this patch series.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat merged commit 27fb303 into firecracker-microvm:main Oct 23, 2023
5 checks passed
@pb8o pb8o mentioned this pull request Oct 29, 2023
9 tasks
@roypat roypat deleted the the-big-ab-ification branch April 15, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants