forked from zrlio/urdma
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
176 lines (132 loc) · 7.02 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
This repository contains the DPDK verbs source along with some demo
applications.
Requirements
------------
- Linux kernel >= 3.17.8
- gcc >= 4.9 or a compiler which supports the C11 standard
- rdma-core >= 17.1 (earlier versions, including the split libraries
prior to the initial consolidated rdma-core release, are no longer
supported)
- DPDK >= 16.07 built as shared libraries (CONFIG_RTE_BUILD_SHARED_LIB=y),
and the rte_kni kernel module built for the currently running kernel
- libnl-3 and libnl-route-3
- json-c
- uthash (included in Fedora, Ubuntu, openSUSE, and EPEL for RHEL
repositories)
Setup instructions
------------------
I am assuming a default installation of Ubuntu 18.04. Other modern
distributions should also work, but you would need to install DPDK
yourself in order to build the required rte_kni kernel modules which
most other distributions do not ship in their repositories.
To build this package, RTE_SDK and RTE_TARGET need to be exported into
the environment. Using the packages in the Ubuntu 16.10 repositories,
you can run the following:
$ source /usr/share/dpdk/dpdk-sdk-env.sh
Or for a manual DPDK installation:
$ export RTE_SDK=${prefix}/share/dpdk
$ export RTE_TARGET=x86_64-native-linuxapp-gcc
If you are pulling this from a fresh git clone, first run:
$ autoreconf -i
Then this follows the normal autotools-style build:
$ ./configure --sysconfdir /etc
$ make
$ sudo make install
Note that sysconfdir must match that of your libibverbs installation, in
order for the verbs library to find the urdma driver.
The configure script will look for your kernel source directory in the
typical location by default (/lib/modules/`uname -r`/source). You can
set the KERNELDIR environment variable to specify a different location;
for example, if you are building against a different kernel version than
what it installed locally.
To run an application with this driver, the KNI and urdma modules must be loaded:
$ sudo modprobe rte_kni
$ sudo modprobe urdma
You need to use the dpdk-devbind script to bind the NICs that you wish
to use with rdma to a DPDK-compatible UIO/VFIO kernel driver, instead of
the default kernel netdev driver. Refer to the DPDK documentation for
more information on this.
You will need to create a file ${sysconfdir}/rdma/urdma.json that looks
like this, with appropriate values substituted in:
{ "ports": [ { "ipv4_address": "10.2.0.100" } ],
"socket": "/run/urdma.sock",
"eal_args": { "log-level": 7 }
}
You can validate your config file against doc/urdma-schema.json using
any JSON schema validator; python-jsonschema which comes with Ubuntu is
one possibility. Note that this schema file is stricter than what is
actually allowed at runtime; at runtime, additional properties are
simply ignored but the schema file does not allow them; this is to make
typos more obvious.
Finally, the urdmad service must be running:
$ systemctl --user start urdmad
This will cause devices to appear in your "ip link" output, and cause
uverbs devices to appear in /sys/class/infiniband_verbs.
Or you can manually run urdmad as root; in this case you will need to
run every verbs application as root to pick up the DPDK configuration
which cannot be relocated.
Troubleshooting Notes
---------------------
Some RDMA demos, particularly the tools that come with librdmacm, can
give confusing error messages when there are no RDMA devices available,
along the lines of:
rdma_create_event_channel: No such device
If you see this error message, it indicates that urdma is not set up
properly. Verify that you followed the steps above properly.
Known Issues
------------
- Shared receive queues (SRQs) are currently not supported. In order
to run openmpi over urdma, you will need to specify the following
command line options to prevent openmpi from using shared receive
queues, and to disable a warning that it doesn't know about the
device vendor ID:
$ mpirun --mca btl_openib_warn_no_device_params_found 0 \
--mca btl_openib_receive_queues P,65536,256,192,128 \
${mpi_app} ${mpi_app_args}...
- If running DPDK sample applications succeeds, but running urdmad
fails with "Configuration expects N devices but found only 0", the
DPDK PMD drivers are probably not being loaded at runtime by default.
If this is the case then you will probably need to load them by hand,
by adding an argument to urdmad like:
urdmad -d ${RTE_SDK}/${RTE_TARGET}/lib/libpmd_net_i40e.so.1
and adding the equivalent to urdma.json:
"eal_args": { /* ... */, "d": "..." }
To avoid this, you can set CONFIG_RTE_EAL_PMD_PATH to a directory
like /usr/local/lib/dpdk-pmds when building DPDK, and the place the
PMD .so files into that directory after DPDK is installed. DPDK will
then auto-load all .so files in that directory as PMD libraries.
- DPDK 17.05 introduced the concept of mempool drivers. If you see a
message like this using DPDK >= 17.05:
MBUF: error setting mempool handler
EAL: Error - exiting with code: 1
Cause: Cannot create rx mempool for port 0 with 8064 mbufs: Invalid argument
then you do not have a mempool driver by default. This occurs
because the linker has been configured with --as-needed by default on
some distributions, and since the mempool and PMD drivers do not
declare any symbols, the linker has no way of knowing that we depend
on the presence of a mempool driver.
Like the PMD issue above, you can pass -d to load the default
librte_mempool_ring driver on the command line, or set
CONFIG_RTE_EAL_PMD_PATH and place mempool driver(s) into that
directory.
- There is a potential race condition with completion channels, where a
completion event can get lost, and thus a thread waiting on
ibv_get_cq_event() will never wake up, leading to a deadlock. A cause has
not been identified, but the issue has not been reproduced with the "extra"
lock added around the rte_ring operations in do_poll_cq() and
finish_post_cqe().
- There is the possibility of a hang in the kernel module if the user process
is killed while between the read() and write() calls on event_fd in
poll_conn_state(). This is because rdma_destroy_id() in the kernel will
block until the connection attempt completes, but itself prevents our
event_fd from being closed which would unblock it.
- The progress thread will use 100% CPU since it must busy-poll on the KNI
interfaces (there is no way to sleep until the process gets an event).
- urdma follows the RFC 5040 ordering rules strictly, meaning that it
can place data segments out of order if it receives them out of
order. This in turn means that if two RDMA WRITE requests are made
on overlapping buffers, urdma may place a data segment from the first
*after* the corresponding data segment from the second, thus leading
to a torn write from the perspective of the application. Thus
applications must not post multiple transfer requests on overlapping
buffers simultaneously if they depend on the data ordering.