Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A/B-based snapshot restore latency test #4089

Merged
merged 5 commits into from
Sep 22, 2023

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Sep 6, 2023

Adds a snapshot restore latency test that is based on A/B-testing. Currently, these tests are not run automatically, but this PR incldues a script for manually running A/B-tests across arbitrary commit ranges, which can be executed in a buildkite step. For the transition period where we still use the old performance test for alarming, this can be a valuable tool for investigating if a reported regression can be traced down to a firecracker change, and if yet, which specific commit caused it.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (42f986e) 82.29% compared to head (c3654b0) 82.29%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4089   +/-   ##
=======================================
  Coverage   82.29%   82.29%           
=======================================
  Files         225      225           
  Lines       28470    28470           
=======================================
  Hits        23429    23429           
  Misses       5041     5041           
Flag Coverage Δ
4.14-c7g.metal 77.70% <ø> (ø)
4.14-m5d.metal 79.59% <ø> (+0.01%) ⬆️
4.14-m6a.metal 78.69% <ø> (ø)
4.14-m6g.metal 77.70% <ø> (ø)
4.14-m6i.metal 79.57% <ø> (ø)
5.10-c7g.metal 80.61% <ø> (ø)
5.10-m5d.metal 82.25% <ø> (ø)
5.10-m6a.metal 81.46% <ø> (ø)
5.10-m6g.metal 80.61% <ø> (ø)
5.10-m6i.metal 82.24% <ø> (+<0.01%) ⬆️
6.1-c7g.metal 80.61% <ø> (?)
6.1-m5d.metal 82.25% <ø> (-0.02%) ⬇️
6.1-m6a.metal 81.46% <ø> (ø)
6.1-m6g.metal 80.61% <ø> (ø)
6.1-m6i.metal 82.24% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat roypat force-pushed the snapshot-ab branch 10 times, most recently from f95f6a1 to dbcd524 Compare September 7, 2023 09:47
@roypat roypat marked this pull request as ready for review September 7, 2023 09:50
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Sep 7, 2023
tools/devctr/pyproject.toml Outdated Show resolved Hide resolved
tools/devctr/poetry.lock Outdated Show resolved Hide resolved
@roypat roypat force-pushed the snapshot-ab branch 2 times, most recently from e663f21 to 0526442 Compare September 11, 2023 08:34
Copy link
Contributor

@pb8o pb8o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I went in reverse order so some comments apply to earlier changes. My bad 😁

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tools/ab_test Outdated Show resolved Hide resolved
tools/ab_test Outdated Show resolved Hide resolved
tools/ab_test Outdated Show resolved Hide resolved
tools/ab_test Outdated Show resolved Hide resolved
tools/ab_test Outdated Show resolved Hide resolved
tests/framework/ab_test.py Outdated Show resolved Hide resolved
@roypat roypat force-pushed the snapshot-ab branch 4 times, most recently from 288d086 to 875f652 Compare September 13, 2023 10:07
@zulinx86 zulinx86 self-requested a review September 13, 2023 12:42
zulinx86
zulinx86 previously approved these changes Sep 13, 2023
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tools/ab_test Outdated Show resolved Hide resolved
@roypat roypat force-pushed the snapshot-ab branch 2 times, most recently from 8fb9826 to 7bafccd Compare September 14, 2023 14:38
zulinx86
zulinx86 previously approved these changes Sep 14, 2023
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tools/ab_test Outdated Show resolved Hide resolved
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
@roypat roypat force-pushed the snapshot-ab branch 4 times, most recently from a84ce43 to 2cdbf18 Compare September 21, 2023 15:20
zulinx86
zulinx86 previously approved these changes Sep 21, 2023
zulinx86
zulinx86 previously approved these changes Sep 21, 2023
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved
tools/ab_test.py Outdated Show resolved Hide resolved
tools/ab_test.py Outdated Show resolved Hide resolved
tools/devtool Show resolved Hide resolved
pb8o
pb8o previously approved these changes Sep 22, 2023
@roypat roypat force-pushed the snapshot-ab branch 5 times, most recently from d1dda22 to 539a60c Compare September 22, 2023 13:12
roypat and others added 5 commits September 22, 2023 14:21
Add an option to pytest to run our integration test suite for an
arbitrary firecracker binary, instead of compiling it from source.

Signed-off-by: Patrick Roy <[email protected]>
Co-authored-by: Pablo Barbáchano <[email protected]>
Instead pass the path to the temporary checkout to the test_runner
closure. The reason being that by automatically chdir-ing into the
temporary directory, it is difficult for the test_runner to get access
to things outside of the temp dir (which has to be explicitly passed via
closure). Without, the test runner knows both where the permanent
firecracker checkout is (via Path.cwd()), and can easily chdir to the
temporary one via the ab_test.chdir() context manager.

Signed-off-by: Patrick Roy <[email protected]>
Added scipy (for statistical tests) and ipython.

We pin numpy to version 1.24.2, as versions newer than this cause python
to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this
will be patched in the upcoming 1.25.3 release.

See numpy/numpy#24028

Also drop boto3, which is unneccessary since
d1015be.

Signed-off-by: Patrick Roy <[email protected]>
Co-authored-by: Pablo Barbáchano <[email protected]>
We add a version of the nightly snapshot restore test that does not
interface with the current baseline-based testing framework. Instead, it
will only write the latency samples to test_results. It respects the
--binary-dir option, meaning this test can be used to collect latency
samples for firecracker binaries compiled from old revisions.

We choose this approach of doing a "data production only" test, instead
of having the actual A/B-test be done inside of pytest for multiple
reasons:
- We cannot compile old firecracker versions from inside pytest, as this
  would require us to nest docker (or rely on the old firecracker
  revision being compilable with the current docker container).
- Doing the A/B-orchestration outside of the test means the test does
  not need to support "metrics only" and "A/B" modes (with the former
  being required for nightly data collection runs).

Signed-off-by: Patrick Roy <[email protected]>
This script is intended to be executed by our CI to perform an A/B-test
across two commits (for instance, a buildkite pipeline would get
triggered on PR merge, and the pipeline will call this script with the
commit range of the just merged PR).

It compiles two git revisions of firecracker using the revisions
devtool, and then passes these binaries to the relevant A/B-test.

After collecting data for both A and B revision, it analyzes the
produced EMF logs for raw time series (e.g. EMF properties/metrics that
are assigned lists of values). For any such data series found, it will
then perform a statistical test to assert that there is no regression in
this data series (for this, it asserts that both A and B revision
produce the same EMF messages (based on dimensions), and that for each
unique dimension, the same data series are emitted).

We choose a Permutation Test as it is non-parametric (which we need
since we cannot make normality assumptions about arbitrary performance
data). Non-parametric here means it compares two arbitrary sets of
samples, and then gives us a p-value about the H_0 hypothesis "both
sets of samples were drawn from the same (unknown) distribution".
The p-value is easy to interpret, as it tells us the probability of
observing a result as bad as the actually measured one, given that
performance did not change.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat merged commit 5c37410 into firecracker-microvm:main Sep 22, 2023
5 checks passed
wearyzen pushed a commit that referenced this pull request Sep 25, 2023
rearrange changes added in #4089 so that the testing
framework is available and the tools scripts work as expected.

Signed-off-by: Sudan Landge <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants