-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It has RDMA net device in continer,but " Init RDMA failed!Create rdma server failed!" Why? #2004
Comments
Hi @hsh258, could you please use something like ib_write_bw or lib-fabric to check whether the rdma dev can work. |
Hi,there are some details: find / -name 'librdmacm*' 2>/dev/null find / -name 'libibverbs*' 2>/dev/null apt-cache search libfabric dpkg -l | grep libfabric however, it has no fi_info tool, can't check"fi_info -p verbs" about "whether the rdma dev can work": the rdma dev can work, surely fi_info |
In addition, the param of "--rdma_endpoint" needs to specify port information for the address. Such as: |
I think one of the reasons vineyard was unable to create an RDMA server was because of the ipv6 address. But I don't know why fi_info can't get device information. If fi_info does not get device information, vineyard theoretically cannot get device information even if it is using ipv4. And ipv6 support is not in our short-term plans at the moment. You can open a new issue about the ipv6 support and we may support ipv6 in the future. Thanks. |
Hi, |
Fabric depends on libibverbs. So libibverbs is necessery. I suggests that you should install the libfabric and fabtest to use fi_info. Refer to the script below: For fabric dependencies(CentOS):
Install fabric and fabtests
Again, vineyard compiles the fabric in the submodule itself, so in theory you only need the ibverbs library to use vineyard RDMA support (The premise is that fabric can also work alone). You can install the fabtests according to the script above and see if the fabtests works.(Such as fi_rma_bw / fi_info) |
Hi ./vineyardd --rdma_endpoint=ipv4_addr:port |
Without mask. It should be the ipv4 address of RDMA device. |
The format of "ipv4:port" only affects the parsing of the port. The reason that it cannot use IPv6 is the same, as it parses the content after the first ":" as the port. The currently specified RDMA IPv4 address will not take effect; instead, it will automatically look for the first suitable RDMA device. This WIP PR supports specifying a particular RDMA device by indicating its IPv4 address. However, it cannot be merged into the main branch for now because the CI failed. Refer to: |
Therefore, I suggest that it is a priority to ensure that fi_info can get the RDMA device information. If fi_info can retrieve the RDMA device information, then vineyard should initialize successfully. If fi_info cannot get the RDMA device information, then vineyard will not be able to initialize either. |
Hi |
No. But you can give a fake ipv4 address. Because vineyard will automatically look for the first suitable RDMA device. As I said above, specifying NIC by address will be supported in the next pr.
If you provide rdma_endpoint when trying to connect to vineyardd, this environment variable will not be read. If you don't give it, it will try to read it. Environment variables are also set by the user.
They won't conflict. |
Hi |
The client should use the exact ipv4 address of vineyard server. Suppose the NIC of the server is at the address 1.2.3.4, and vineyard RDMA server use port of 1234. You should use 1.2.3.4:1234 as the VINEYARD_RDMA_ENDPOINT of client. It is also currently not possible for client to specify the NIC used to send the data, so this field means the address of the server.
To summarize, there is currently no way for the server to specify the NIC, and the server will automatically select the appropriate NIC to listen to RDMA messages. The rdma endpoint on the client side is the address of the server. It is also currently not possible for client to specify the NIC used for sending data. The feature to specify NIC will be supported in the pr mentioned above, but cannot currently be merged into the main branch. |
Hi However,detail show rpc service is TCP, no RDMA. At the same time , |
Do not make RDMA and RPC work on the same port if they use the same NIC. |
Are the client and server in the same container? If not, can the client's container connect to the server? You can start the server and client in the same container to test if the vineyard RDMA works. |
Hi
As server: E20241103 02:53:37.765411 178 rpc_server.cc:203] Receive vineyard request mem! E20241103 03:03:53.214725 181 rpc_server.cc:203] Receive vineyard request mem! As the same client,don't change anything, it is sometimes possible to put success, but sometimes fail. |
Hi. Could you please show me the complete instructions to start vineyardd and the code of putting object on the client side? Let me test it locally. |
Hi, instructions are there:
|
And the command of starting a vineyardd? |
|
And If registering memory fails, try increasing the vineyard's available memory. |
|
Hi @hsh258. The 9600 is the default port for RPC, you should define another unique port for RDMA endpoint such as "10.13.228.2:9601" |
Hi , export VINEYARD_RDMA_ENDPOINT=10.13.228.2:19601python3Python 3.8.10 (default, Sep 11 2024, 16:02:53)
|
The rpc must be connected at first while using the rdma, you can try the following code.
|
Hi, accord to above order, In client: |
Could you please add an option (size:1024Mi) to the vineyard yaml as follows and try again?
|
Hi |
Sorry for the misleading indentation. You could try the following command.
|
Hi, above issue, |
Hi, |
Does it work now? I think It should be caused by the indentation problem. |
It shouldn't be very slow, what's your k8s environment (ack/aws/...) and machine environment ? |
Hi, |
Hi, |
How do you install the vineyard operator? Besides, can you copy the code to the shell and try again, it's better to show the failed screenshot so that we can check where is wrong. |
Hi, |
|
Hi, |
I think it is likely to be caused by your environmental factors. |
Hi, |
Hi, |
Refer to src/common/rdma/rdma_client.cc, src/common/rdma/rdma_server.cc and https://ofiwg.github.io/libfabric/ Vineyard client get the server RDMA device info by calling fi_getinfo with param of server ip address. |
Hi, |
Since there is no way to replicate your current environment, and no ipv6 support for vineyard, it may be difficult to give appropriate advice.
We haven't tried to change this field, so I suggest you ask in the fabric community. |
Hi, |
Hi, what is the size of the blob you are testing now? At present, Vineyard RDMA module needs to adapt to some working conditions, so blob has advantages over TCP only when it is larger than 4M. |
Hi, |
rpc_server.cc:112] Init RDMA failed!Create rdma server failed!
Describe your problem
A clear and concise description of what your problem is. It might be a bug,
a feature request, or just a problem that need support from the vineyard team.
If is is a bug report, to help us reproducing this bug, please provide information below:
uname -a
):vineyard.__version__
):If it is a feature request, please provides a clear and concise description of what you want to happen:
What is the problem:
The behaviour that you expect to work:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: