-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rdma: Add support for running the rdma tests over hardware #86
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have RDMA hardware, but this change looks valuable for me. I suggest add some more description in the commit message why this change is good. Also, note about the use_hw_rdma in Documentation/running-test.md will be helpful.
Overall, this change looks a good improvement for me. Richer commit message and explanation in Documentation/running-test.md will be required. @bvanassche @sagigrimberg @ChaitanayaKulkarni |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented, richer commit message and explanation in Documentation/running-test.md will be helpful. Other than that, this change looks good to me. Other NVME experts may have some more comments.
Hi @Kamalheib thanks for updating the patch. Could you share the result of test runs with this change and the real RDMA hardware? If blktests runs with real RDMA hardware, it is great :) Do you see any failure? As for the nvme test group, I'm curious which transport you use, and if you set def_traddr or not (Sagi mentioned that default address is local, so I'm not sure if the local address really works with the real hardware.) Also I think the commit needs a couple of brush up: rdma hardware configuration check that Sagi mentioned and commit message. I maybe able to help them. |
Hi @kawasaki, For proof of concept, I tested SRP only :-), but after I saw that you are interested in nvme, I tested nvme too, I was using ConnectX-6 Dx in RoCE mode, and I saw some failure with both tests (please see below), with regards the commit brush up, please let me know what needs to be changed and I will work on fixing them.
|
Thanks! The test looks working :) It is good to see the failures. Sooner or later, they must be addressed. If they are kernel bugs, it's great we catch them. Or they might be test side bugs. I think this PR can be applied to blktests before resolving the failures. Before that, let's do some more brush up. |
Here I note the improvement points I found:
Also, it would be the better to check that both use_rxe and use_hw_rdma are not. Probably in start_rdma()?
I hope these will help. If my action is needed, please let me know :) |
The blktests are using soft-RoCE (rdma_rxe) and soft-iWARP (siw) to run RDMA related tests, this change add support for run nvme-rdma and SRP tests with real RDMA hardware, this is needed to make sure that we don't have issues when using real RDMA hardware. Signed-off-by: Kamal Heib <[email protected]>
Hi @kawasaki,
Done, please review the changes.
Seems like the tests are using the interface IP address "1.1.1.5" to do loopback: [339197.231226] run blktests nvme/003 at 2023-10-23 10:21:28
Done, please review the changes.
Done, please review the changes.
Thank you very much for your feedback. |
Thanks for sharing the kernel message. I wonder where the address 1.1.1.5 comes from. Is it set up for the real hardware RDMA driver? In _create_nvmet_port, the def_traddr 127.0.0.1 is set in /sys/kernel/config/nvmet/ports/${port}/addr_traddr. If the hardware driver does not refer it, it looks ok to see the address 1.1.1.5. |
Your change looks working. I ran enabling both use_rxe and use_hw_rdma, and saw the message is printed in the failure log. But I rethought, and now I think it should be done in _nvme_requires(), because if both use_rxe and use_hw_rdma are set, test cases are skipped.
Sorry for changing my mind. If you don't mind, I can make this change. |
Thanks @Kamalheib for updating the patch. Other than my comment above, your change looks good to me. As the next step, I posted a notice in relevant mailing lists to get attention by key kernel developers to this PR. Let's see feedbacks from them. |
When using 'real' hardware you typically don't get to control the target; one calls 'nvme connect' and then has to live with whatever controller is configured on the other end.
|
@hreinecke Thanks for the comment. @Kamalheib What do you think? Your approach might be good for your use case, but it may not be good enough for other users. I think the comment by Hannes is covering the nvme group. Similar discussion might be valid for srp. I wonder which test group @Kamalheib is interested in: srp? nvme RDMA? Or both? FYI, here's the link to the discussion on the list. |
I suggest to hold merging the commits of PR at this moment. As for nvme-rdma, @hreinecke suggested different approach and he opened PR #127 for it. Let's see how the new PR will go. As for srp, the original author @bvanassche provided a comment. He pointed out that srp test group was not designed to run with real RDMA hardware, and it may beyond the scope of blktests and increase the burden of maintenance of blktests. Regarding the scope and burden discussion, I do not yet have clear answer. I think it depends on if the blktests run with real RDMA hardware can find out kernel issue or not. @Kamalheib reported a few failures with real RDMA hardware, but they do not look showing kernel issues. If @Kamalheib or anyone has experience of kernel issues found with real RDMA hardware, we can revisit this PR. |
The 1.1.1.5 is the IP address that I've configured for the network interface. |
@kawasaki The idea behind the PR is the following: 1- Avoid the instability of soft-roce and soft-iwarp (soft-iwarp is more stable now). I totally understand the concerns and the feedback from the community, but most of the time people will use real hardware to leverage the features of RDMA. Thanks, |
I haven't encountered any stability issues with the soft-iWARP driver in the past few years. Did I perhaps miss something?
Let me repeat that testing RDMA adapters is out of scope for the SRP tests. The SRP tests have not been designed to test RDMA adapters. With the proposed changes the SRP tests only test a small subset of RDMA driver and adapter functionality, namely loopback support. These tests do not trigger any communication over the network ports of RDMA adapters. |
From my point of view, blktests is the framework "for the Linux kernel block layer and storage stack", so RDMA adapter is out of blktests scope. On the other hand RDMA driver could fall in the scope (Bart put it out of the design scope, though). Next question is this: is blktests SRP group is the best test framework to test RDMA drivers? Maybe no, since it covers only loopback part of the drivers as Bart says. We need other tests to cover RDMA driver code thoroughly. Having said that, I still think blktests srp test group can work as an "integration test" to cover the integration between RDMA drivers and SRP driver. It does extend hw driver coverage. Even though the extended coverage size is small, I guess it still have chance to find any driver issues. It does not sound that Kamal found such issues yet. So we are guessing the future. Will it find driver issues or not? (Which side will you bet? :) Bart is taking the SRP test group maintenance cost, and I know Bart's time is precious. So maybe I or Kamal should take the additional cost. One idea is that I or Kamal keep a separated branch for some months so that Kamal can run the SRP tests with real RDMA adapters. If it catches any issue, we can prove the value to merge the code to the master. Otherwise, maybe the integration part is stable enough. What do you think Kamal? (BTW, I'll take one week vacation from tomorrow, so my response will be slow) |
Signed-off-by: Kamal Heib [email protected]