New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

A/B-based snapshot restore latency test #4089

Merged

roypat merged 5 commits into firecracker-microvm:main from roypat:snapshot-ab

Sep 22, 2023

Contributor

roypat commented Sep 6, 2023 •

edited

Loading

Adds a snapshot restore latency test that is based on A/B-testing. Currently, these tests are not run automatically, but this PR incldues a script for manually running A/B-tests across arbitrary commit ranges, which can be executed in a buildkite step. For the transition period where we still use the old performance test for alarming, this can be a valuable tool for investigating if a reported regression can be traced down to a firecracker change, and if yet, which specific commit caused it.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

If a specific issue led to this PR, this PR closes the issue.
The description of changes is clear and encompassing.
Any required documentation changes (code and docs) are included in this PR.
API changes follow the Runbook for Firecracker API changes.
User-facing changes are mentioned in CHANGELOG.md.
All added/changed functionality is tested.
New TODOs link to an issue.
Commits meet contribution quality standards.

This functionality cannot be added in rust-vmm.

codecov bot commented Sep 6, 2023 •

edited

Loading

Codecov Report

Patch and project coverage have no change.

Comparison is base (42f986e) 82.29% compared to head (c3654b0) 82.29%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4089   +/-   ##
=======================================
  Coverage   82.29%   82.29%           
=======================================
  Files         225      225           
  Lines       28470    28470           
=======================================
  Hits        23429    23429           
  Misses       5041     5041

Flag	Coverage Δ
4.14-c7g.metal	`77.70% <ø> (ø)`
4.14-m5d.metal	`79.59% <ø> (+0.01%)`	⬆️
4.14-m6a.metal	`78.69% <ø> (ø)`
4.14-m6g.metal	`77.70% <ø> (ø)`
4.14-m6i.metal	`79.57% <ø> (ø)`
5.10-c7g.metal	`80.61% <ø> (ø)`
5.10-m5d.metal	`82.25% <ø> (ø)`
5.10-m6a.metal	`81.46% <ø> (ø)`
5.10-m6g.metal	`80.61% <ø> (ø)`
5.10-m6i.metal	`82.24% <ø> (+<0.01%)`	⬆️
6.1-c7g.metal	`80.61% <ø> (?)`
6.1-m5d.metal	`82.25% <ø> (-0.02%)`	⬇️
6.1-m6a.metal	`81.46% <ø> (ø)`
6.1-m6g.metal	`80.61% <ø> (ø)`
6.1-m6i.metal	`82.24% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

roypat force-pushed the snapshot-ab branch 10 times, most recently from f95f6a1 to dbcd524 Compare

September 7, 2023 09:47

roypat marked this pull request as ready for review

September 7, 2023 09:50

roypat added the Status: Awaiting review label

roypat force-pushed the snapshot-ab branch from dbcd524 to 7f62cab Compare

September 7, 2023 09:55

pb8o assigned roypat

pb8o reviewed

View reviewed changes

tools/devctr/pyproject.toml Outdated Show resolved Hide resolved

tools/devctr/poetry.lock Outdated Show resolved Hide resolved

roypat force-pushed the snapshot-ab branch 2 times, most recently from e663f21 to 0526442 Compare

September 11, 2023 08:34

pb8o reviewed

View reviewed changes

Contributor

pb8o left a comment

OK I went in reverse order so some comments apply to earlier changes. My bad 😁

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tools/ab_test Outdated Show resolved Hide resolved

tools/ab_test Outdated Show resolved Hide resolved

tools/ab_test Outdated Show resolved Hide resolved

tools/ab_test Outdated Show resolved Hide resolved

tools/ab_test Outdated Show resolved Hide resolved

tests/framework/ab_test.py Outdated Show resolved Hide resolved

roypat force-pushed the snapshot-ab branch 4 times, most recently from 288d086 to 875f652 Compare

September 13, 2023 10:07

zulinx86 self-requested a review

September 13, 2023 12:42

zulinx86 previously approved these changes

View reviewed changes

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tools/ab_test Outdated Show resolved Hide resolved

roypat dismissed zulinx86’s stale review via

8fb9826

September 14, 2023 14:37

roypat force-pushed the snapshot-ab branch 2 times, most recently from 8fb9826 to 7bafccd Compare

September 14, 2023 14:38

zulinx86 previously approved these changes

View reviewed changes

pb8o reviewed

View reviewed changes

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tools/ab_test Outdated Show resolved Hide resolved

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

roypat force-pushed the snapshot-ab branch 4 times, most recently from a84ce43 to 2cdbf18 Compare

September 21, 2023 15:20

zulinx86 previously approved these changes

View reviewed changes

roypat dismissed zulinx86’s stale review via

ff269c4

September 21, 2023 16:12

roypat force-pushed the snapshot-ab branch from 2cdbf18 to ff269c4 Compare

September 21, 2023 16:12

zulinx86 previously approved these changes

View reviewed changes

roypat dismissed zulinx86’s stale review via

269bb1a

September 22, 2023 09:12

roypat force-pushed the snapshot-ab branch from ff269c4 to 269bb1a Compare

September 22, 2023 09:12

pb8o reviewed

View reviewed changes

tools/test-popular-containers/test-docker-rootfs.py Show resolved Hide resolved

tests/integration_tests/performance/test_snapshot_ab.py Show resolved Hide resolved

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tests/integration_tests/performance/test_snapshot_ab.py Outdated Show resolved Hide resolved

tools/ab_test.py Outdated Show resolved Hide resolved

tools/ab_test.py Outdated Show resolved Hide resolved

tools/devtool Show resolved Hide resolved

roypat force-pushed the snapshot-ab branch from 269bb1a to b46c224 Compare

September 22, 2023 09:29

pb8o previously approved these changes

View reviewed changes

roypat dismissed pb8o’s stale review via

194cad3

September 22, 2023 11:06

roypat force-pushed the snapshot-ab branch 5 times, most recently from d1dda22 to 539a60c Compare

September 22, 2023 13:12

roypat and others added 5 commits

September 22, 2023 14:21


          test: Add pytest option to use custom firecracker binary

e562b38

Add an option to pytest to run our integration test suite for an
arbitrary firecracker binary, instead of compiling it from source.

Signed-off-by: Patrick Roy <[email protected]>
Co-authored-by: Pablo Barbáchano <[email protected]>


          test: Do not have git_ab_test chdir into temporary checkout

4cbd7e0

Instead pass the path to the temporary checkout to the test_runner
closure. The reason being that by automatically chdir-ing into the
temporary directory, it is difficult for the test_runner to get access
to things outside of the temp dir (which has to be explicitly passed via
closure). Without, the test runner knows both where the permanent
firecracker checkout is (via Path.cwd()), and can easily chdir to the
temporary one via the ab_test.chdir() context manager.

Signed-off-by: Patrick Roy <[email protected]>


          chore: Rebuild devctr

f5ff601

Added scipy (for statistical tests) and ipython.

We pin numpy to version 1.24.2, as versions newer than this cause python
to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this
will be patched in the upcoming 1.25.3 release.

See numpy/numpy#24028

Also drop boto3, which is unneccessary since
d1015be.

Signed-off-by: Patrick Roy <[email protected]>
Co-authored-by: Pablo Barbáchano <[email protected]>


          test: Add a snapshot restore test compatible with A/B-testing

e4ce63e

We add a version of the nightly snapshot restore test that does not
interface with the current baseline-based testing framework. Instead, it
will only write the latency samples to test_results. It respects the
--binary-dir option, meaning this test can be used to collect latency
samples for firecracker binaries compiled from old revisions.

We choose this approach of doing a "data production only" test, instead
of having the actual A/B-test be done inside of pytest for multiple
reasons:
- We cannot compile old firecracker versions from inside pytest, as this
  would require us to nest docker (or rely on the old firecracker
  revision being compilable with the current docker container).
- Doing the A/B-orchestration outside of the test means the test does
  not need to support "metrics only" and "A/B" modes (with the former
  being required for nightly data collection runs).

Signed-off-by: Patrick Roy <[email protected]>


          test: script for executing A/B-Tests

c3654b0

This script is intended to be executed by our CI to perform an A/B-test
across two commits (for instance, a buildkite pipeline would get
triggered on PR merge, and the pipeline will call this script with the
commit range of the just merged PR).

It compiles two git revisions of firecracker using the revisions
devtool, and then passes these binaries to the relevant A/B-test.

After collecting data for both A and B revision, it analyzes the
produced EMF logs for raw time series (e.g. EMF properties/metrics that
are assigned lists of values). For any such data series found, it will
then perform a statistical test to assert that there is no regression in
this data series (for this, it asserts that both A and B revision
produce the same EMF messages (based on dimensions), and that for each
unique dimension, the same data series are emitted).

We choose a Permutation Test as it is non-parametric (which we need
since we cannot make normality assumptions about arbitrary performance
data). Non-parametric here means it compares two arbitrary sets of
samples, and then gives us a p-value about the H_0 hypothesis "both
sets of samples were drawn from the same (unknown) distribution".
The p-value is easy to interpret, as it tells us the probability of
observing a result as bad as the actually measured one, given that
performance did not change.

Signed-off-by: Patrick Roy <[email protected]>

roypat force-pushed the snapshot-ab branch from e35ac6c to c3654b0 Compare

September 22, 2023 13:21

pb8o approved these changes

View reviewed changes

zulinx86 approved these changes

View reviewed changes

roypat merged commit 5c37410 into firecracker-microvm:main

5 checks passed

wearyzen mentioned this pull request

fix:import testing framework for few tools scripts #4130

Merged

9 tasks

wearyzen pushed a commit that referenced this pull request


          fix:import testing framework for few tools scripts

71c56c0

rearrange changes added in #4089 so that the testing
framework is available and the tools scripts work as expected.

Signed-off-by: Sudan Landge <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review