Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rxe_cfg network namespace support #12

Open
ziyin-dl opened this issue Oct 6, 2017 · 4 comments
Open

rxe_cfg network namespace support #12

ziyin-dl opened this issue Oct 6, 2017 · 4 comments

Comments

@ziyin-dl
Copy link

ziyin-dl commented Oct 6, 2017

I recently had an issue with adding an interface in a separate network namespace. The setting is as follows:
I have 2 servers, running Ubuntu 16.04 with kernel version 4.14. Each server has 4 network interfaces, all with intel 1000base-T NICs (I211). I am running soft-RoCE on both machines. Installed the user space libraries as instructed. Furthermore, I verified that soft-RoCE works by running ibv_rc_pingpong between the 2 servers.

However, what I would like to do is to create a couple of containers on the servers, and each container has a separate interface, with soft-RoCE running on top of every interfaces. I want the containers to have separated interfaces to stop Linux internal routing and for load balancing purposes. To do this, I created network namespaces for each container, and moved the physical interfaces into their corresponding network namespace. When I tried adding the NICs using rxe_cfg add, it throws an error saying

$ sudo ip netns exec $(PID) rxe_cfg add p3p1
[ 3100.844015] rdma_rxe: interface p3p1 not found
sh: echo: I/O error

(p3p1 is the interface I moved to PID's namespace)

However, it seems that rxe_cfg status can correctly identify the device in the namespace:
$ sudo ip netns exec $(PID) rxe_cfg status
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
p3p1 yes igb 1500 192.168.10.1

Does soft-RoCE work in this setting, or I missed something in the setup? If it is doable, what is the right way to have separated soft-RoCE devices for different namespaces?

@G3orge26
Copy link

Hi, I understand this is issue has been left hanging for quite some time, but I am having the same issue and was wondering if someone found a solution or workaround to this. Thank you very much in advance.

@qmaldon
Copy link

qmaldon commented Jan 17, 2021

Hi, 2021. Is there any update or workaround on this issue.
I'm interested in running Soft-RoCE on a virtual network of docker containers.
Thanks

@ziyin-dl
Copy link
Author

I eventually got it working but that was back in 2017, so I could not remember the details at this moment. Basically the issue is the soft-RoCE code only searches for devices in the default network namespace. As a result devices in a different network namespace (e.g. in a container) cannot be discovered. You have to change the code so that it looks for the device in the right network namespace.

Also there are some code re. IPv4/v6 routing has to be changed too. This is also caused by network namespace issue (each NS has its own routing table).

Once these two are fixed the code should be good to go.

@qmaldon
Copy link

qmaldon commented Jan 17, 2021

I eventually got it working but that was back in 2017, so I could not remember the details at this moment. Basically the issue is the soft-RoCE code only searches for devices in the default network namespace. As a result devices in a different network namespace (e.g. in a container) cannot be discovered. You have to change the code so that it looks for the device in the right network namespace.

Also there are some code re. IPv4/v6 routing has to be changed too. This is also caused by network namespace issue (each NS has its own routing table).

Once these two are fixed the code should be good to go.

I managed to get it partially working with only 1 Soft RoCE device shared with multiple virtual net interfaces.
I'm using some docker containers, and I'm creating some bridge networks. So each container has its own virtual interface.

I can see it working, but all requests and responses go through the same Soft RoCE device.

If I test connectivity from rx0 to rx1 it fails, even from the main host (no container).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants