-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A/B-based snapshot restore latency test #4089
Conversation
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## main #4089 +/- ##
=======================================
Coverage 82.29% 82.29%
=======================================
Files 225 225
Lines 28470 28470
=======================================
Hits 23429 23429
Misses 5041 5041
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
f95f6a1
to
dbcd524
Compare
e663f21
to
0526442
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I went in reverse order so some comments apply to earlier changes. My bad 😁
288d086
to
875f652
Compare
8fb9826
to
7bafccd
Compare
a84ce43
to
2cdbf18
Compare
d1dda22
to
539a60c
Compare
Add an option to pytest to run our integration test suite for an arbitrary firecracker binary, instead of compiling it from source. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Instead pass the path to the temporary checkout to the test_runner closure. The reason being that by automatically chdir-ing into the temporary directory, it is difficult for the test_runner to get access to things outside of the temp dir (which has to be explicitly passed via closure). Without, the test runner knows both where the permanent firecracker checkout is (via Path.cwd()), and can easily chdir to the temporary one via the ab_test.chdir() context manager. Signed-off-by: Patrick Roy <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
We add a version of the nightly snapshot restore test that does not interface with the current baseline-based testing framework. Instead, it will only write the latency samples to test_results. It respects the --binary-dir option, meaning this test can be used to collect latency samples for firecracker binaries compiled from old revisions. We choose this approach of doing a "data production only" test, instead of having the actual A/B-test be done inside of pytest for multiple reasons: - We cannot compile old firecracker versions from inside pytest, as this would require us to nest docker (or rely on the old firecracker revision being compilable with the current docker container). - Doing the A/B-orchestration outside of the test means the test does not need to support "metrics only" and "A/B" modes (with the former being required for nightly data collection runs). Signed-off-by: Patrick Roy <[email protected]>
This script is intended to be executed by our CI to perform an A/B-test across two commits (for instance, a buildkite pipeline would get triggered on PR merge, and the pipeline will call this script with the commit range of the just merged PR). It compiles two git revisions of firecracker using the revisions devtool, and then passes these binaries to the relevant A/B-test. After collecting data for both A and B revision, it analyzes the produced EMF logs for raw time series (e.g. EMF properties/metrics that are assigned lists of values). For any such data series found, it will then perform a statistical test to assert that there is no regression in this data series (for this, it asserts that both A and B revision produce the same EMF messages (based on dimensions), and that for each unique dimension, the same data series are emitted). We choose a Permutation Test as it is non-parametric (which we need since we cannot make normality assumptions about arbitrary performance data). Non-parametric here means it compares two arbitrary sets of samples, and then gives us a p-value about the H_0 hypothesis "both sets of samples were drawn from the same (unknown) distribution". The p-value is easy to interpret, as it tells us the probability of observing a result as bad as the actually measured one, given that performance did not change. Signed-off-by: Patrick Roy <[email protected]>
rearrange changes added in #4089 so that the testing framework is available and the tools scripts work as expected. Signed-off-by: Sudan Landge <[email protected]>
Adds a snapshot restore latency test that is based on A/B-testing. Currently, these tests are not run automatically, but this PR incldues a script for manually running A/B-tests across arbitrary commit ranges, which can be executed in a buildkite step. For the transition period where we still use the old performance test for alarming, this can be a valuable tool for investigating if a reported regression can be traced down to a firecracker change, and if yet, which specific commit caused it.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
CHANGELOG.md
.TODO
s link to an issue.rust-vmm
.