-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workflow to run tests on real hardware #2545
Conversation
Looks really good already. I think we should start using the json output for the new code and transform the old code afterwards when the CI is up and running. But if you think you would like first to get it running as it is and then change the tests accordingly, that's also fine with me. |
This fixes "SyntaxWarning: invalid escape sequence" when `get_error_log` is being called. Signed-off-by: Dennis Maisenbacher <[email protected]>
Thanks for the review :) I changed the sections you commented on. Letting the tests run on my branch now. |
Calculate ncap with configured flbas instead of setting a hard coded value that might be to big for the test device. Also, for this test to not fail the `nvme list-ctrl` outputs a summary of ctrls present which must be skipped when parsing for the ctrl-id. Signed-off-by: Dennis Maisenbacher <[email protected]>
Reworking the `get_ocfs` function to correctly parse the ocfs field. At the same time refactor `nvme id-ctrl` that extracts the ocfs field into separate functions. This is in preparation for following commits that reuse the same functionality. Furthermore nvme_copy_test fails on devices that do not support any copy formats. For this to pass we need to declare `self.host_behavior_data` in any case. Signed-off-by: Dennis Maisenbacher <[email protected]>
Fix off by one errors and capacity allocation. Small speedup for nvme_create_max_ns_test by reducing the io done by `run_ns_io`. For this we introduce a new count parameter which can overwrite the default value of 10. Signed-off-by: Dennis Maisenbacher <[email protected]>
Run I/O after a successful ctrl reset. This tests if the sqs and cqs are constructed after reset. Signed-off-by: Dennis Maisenbacher <[email protected]>
Set ncap and nsze to the lowest possible value, such that this test can be run on different device capacities. Signed-off-by: Dennis Maisenbacher <[email protected]>
Specify mandatory namespace for Error Recovery (Feature Identifier 05h) get features command. Signed-off-by: Dennis Maisenbacher <[email protected]>
Use the long option for a vendor specific id-ctrl instead of the verbose flag 'v'. Signed-off-by: Dennis Maisenbacher <[email protected]>
Don't parse the smart log output to then print the (unsuccessfuly) parsed values. Instead we print out the whole smart log output. Signed-off-by: Dennis Maisenbacher <[email protected]>
The checkpatch is failing with the following:
I am not sure if this is applicable since I can't spot a MAINTAINERS file 🤔 |
Check if the NVM 'compare' command is supported before running the nvme_compare_test which whould then fail with an 'Invalid Command Opcode' Signed-off-by: Dennis Maisenbacher <[email protected]>
Look up if the drive supports the `Get LBA Status` optional admin command before executing a `nvme get-lba-status` or `nvme lba-status-log` command. Furthermore use the correct action value on `get-lba-status`. Signed-off-by: Dennis Maisenbacher <[email protected]>
After a test was run, the `tearDown` function is called and then creates and attaches a single ns with the full NVM capacity. This is done so the caller or the next test case receives a reasonably formated drive. Signed-off-by: Dennis Maisenbacher <[email protected]>
Introducing a GitHub workflow which runs all test cases under the `tests` directory on real hardware through a self-hosted runner. This workflow is triggered nightly or on demand as the tests run about an hour. Signed-off-by: Dennis Maisenbacher <[email protected]>
You can ignore that checkpatch failure. We don't do the MAINTAINER file thing, so any new file added to project will trigger this message. |
BTW, I've just remembered, that we also have a Python binding for the library which provides the constants. Maybe we could use these in future as well. Just as idea. |
Now, I have to figure out how to enable it :) |
Thanks for the review and merge :) |
This PR introduces a GitHub workflow that runs all test cases under the
tests
directory on real hardware.In preparation, the existing test cases are fixed, such that the new
run-tests
workflow completes successfully (see actions on my fork's masterbranch).
The infrastructure that provides the self-hosted GitHub runner with real nvme
devices attached to it are provided by Western Digital Corporation.
On a high-level overview, this infrastructure contains multiple storage nodes
that form a Kubernetes cluster. Within this Kubernetes cluster, KubeVirt
virtual machines can be spawned on demand with different hardware
configurations. Those VMs are owned by different users. One user group that
is separated in its own namespace is self-hosted GitHub runners.
The overall goal of this infrastructure is to provide device access to open
source projects for testing purposes. The first two pioneering candidates to
make use of this service are nvme-cli and ZenFS (see westerndigitalcorporation/zenfs#294).
The new self-hosted GitHub runner IaC is to be open-sourced.
If people want to see other nvme devices for testing in GitHub workflows, we
are happy to discuss details on how to move forward with integrating new
test devices.
Initial requirements for this self-hosted runner:
Ultrastar® SN640 and Western Digital Ultrastar® ZN540 are provided)
necessary ports.
after each workflow run.
ssh is just allowed from local IPs, https is just allowed to and from
external addresses (no access to internal cluster services), access to VM
image repository access just from local IPs.
only allow sshd access from cluster local IPs into the VM.
IMPORTANT: Configuration required by the GitHub repo that uses the self-hosted runner:
one repository
Repo -> Settings -> Actions -> General -> Actions permissions
Repo -> Settings -> Actions -> General -> 'Require approval for all outside
collaborators' -> Save
Repo -> Settings -> Actions -> General -> 'Read repository content and packages
permissions' -> Save
GITHUB_TOKEN:
Repo -> Settings -> Actions -> General -> DISABLE 'Allow GitHub Actions to
create and approve pull requests' -> Save
(https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions)
In a GitHub workflow that uses a container on the self-hosted instance, the
block device has to be passed into the container:
Before this PR gets merged we should make sure that the self-hosted runner is
added to the repository, which requires a token exchange on a side channel. :)