Replies: 7 comments
-
Is there any explicit support that iperf2 needs for 200G and faster networks? For iperf3, we've got some folks getting 148Gbps with the multi-threaded iperf3 beta, on 200GE back-to-back (or at least it's within the same data center), using fasterdata.es.net tunings. To my knowledge there weren't any settings needed for 200GE, and it was just a matter of cranking up the number of parallel streams (and hence threads). Pinning threads to specific cores was helpful. It might benefit from various tweaks, particularly over long paths, although I think the Linux kernel auto-tunes socket buffer sizes so this might not even be needed. We do support the use of sendfile but as far as I know our use cases don't require or make use of it. I'm not familiar with BIG TCP, need to ask around about that. Thanks for the pointer to the reddit thread. That's interesting! |
Beta Was this translation helpful? Give feedback.
-
In my testing, I have not seen an advantage for iperf3 with sendfile with 100G NICS, as iperf3 seems to be CPU bound, not memory copy bound. But I've noticed that iperf2 seems to use slightly less CPU (based on output of mpstat), so it might be worth trying with iperf2 for comparision. I'm also curious about BIG_TCP, but have not tested it yet. |
Beta Was this translation helpful? Give feedback.
-
Actually I take that back. I did some more testing, and I am now seeing an advantage to using the -Z (sendfile) flag if paired with pacing. -Z without pacing was leading to more retransmits, and thus throughput did not increase. e.g., using these options I'm getting 70 Gbps between 2 100G hosts: numactl -N 1 /usr/local/bin/iperf3 -c hostname -t 60 -4 -P 4 --fq-rate 20G -Z While without -Z, I only get 55 Gbps: (This is using the new iperf3 multi-threaded beta release) |
Beta Was this translation helpful? Give feedback.
-
I did some testing with the neper tool from google, which support SO_ZEROCOPY/MSG_ZEROCOPY via the -Z flag, and found it made a big difference, so long as the receiver is not CPU limited. See: https://github.com/google/neper The trick is to use the --skip-rx-copy flag on the server to make sure you are not limited by the receiver. Here are my test commands and results: (I'm using numactl to use the same core every time, as different cores often give different throughput) server: numactl -C 5 ./tcp_stream --skip-rx-copy client: numactl -C 5 ./tcp_stream -c -H 10.10.2.62 -Z client with MSG_ZEROCOPY: numactl -C 5 ./tcp_stream -c -H 10.10.2.62 -Z Based on this, I think it would be great if both iperf2 and iperf3 supported MSG_ZEROCOPY (and --skip-rx-copy, which sets the TRUNC flag in recv). Looking at the code for this in neper, it should be a pretty easy enhancement for both iperf2/iperf3. PS: I'm also testing BIG TCP now too. I'll let y'all know what I find. But BIG TCP needs to be enabled at the system level, so its not something you can set in iperf. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this Brian. I added ticket 305 for this. I'm currently trying
to get the next release for iperf 2 out. I plan to add this into the
subsequent release unless I hear a major push to delay this release to
add this capability. Otherwise, it will be a build from master to use it
in 2024 (I've been releasing iperf 2 about once per anum in March)
https://sourceforge.net/p/iperf2/tickets/305/https://sourceforge.net/p/iperf2/tickets/305/
Bob
… I did some testing with the neper tool from google, which support
SO_ZEROCOPY/MSG_ZEROCOPY via the -Z flag, and found it made a big
difference, so long as the receiver is not CPU limited. See:
https://github.com/google/neper
The trick is to use the --skip-rx-copy flag on the server to make sure
you are not limited by the receiver.
Here are my test commands and results: (I'm using numactl to use the
same core every time, as different cores often give different
throughput)
server: numactl -C 5 ./tcp_stream --skip-rx-copy
client: numactl -C 5 ./tcp_stream -c -H 10.10.2.62 -Z
result: 37Gbps
client with MSG_ZEROCOPY: numactl -C 5 ./tcp_stream -c -H 10.10.2.62
-Z
result: 47Gbps
Based on this, I think it would be great if both iperf2 and iperf3
supported MSG_ZEROCOPY (and --skip-rx-copy, which sets the TRUNC flag
in recv).
Looking at the code for this in neper, it should be a pretty easy
enhancement for both iperf2/iperf3.
PS: I'm also testing BIG TCP now too. I'll let y'all know what I find.
But BIG TCP needs to be enabled at the system level, so its not
something you can set in iperf.
--
Reply to this email directly, view it on GitHub [1], or unsubscribe
[2].
You are receiving this because you authored the thread.Message ID:
***@***.***>
Links:
------
[1]
#1520 (comment)
[2]
https://github.com/notifications/unsubscribe-auth/AFL6GABNX4FPLMFQPSB7XJLYZBAWFAVCNFSM6AAAAAAYFLLGT2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DQNBRGMYDO
|
Beta Was this translation helpful? Give feedback.
-
I implemented a naive UDP SO_ZEROCOPY/MSG_ZEROCOPY support for iperf3. Because in this implementation I don't check that the buffer was sent and that its memory is unlocked, I didn't expect much improvement if any. However, with one stream I get only about 30% throughput when I therefore wonder whether it is worth implementing the full solution. Tests are done on my local laptop, running both the server and client under WSL Ubuntu. It may be that the problem is related to either WSL (and Windows?) or to running both the client and server in the same machine. However, it is not clear to me how that can cause a 70% throughput reduction. |
Beta Was this translation helpful? Give feedback.
-
Submitted PR #1690 with support of |
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm considering adding 200G+ support to iperf 2 and thought it might be interesting to get iperf 3 engineers' thoughts. One initial question I have is about using sendfile vs the SO_ZEROCOPY/MSG_ZEROCOPY socket/send option. Any thoughts on that?
Also, I'm wondering about BIG TCP and if there have been any experiments with v6 or possibly v4?
Bob
PS. A reddit thread on 400G
Beta Was this translation helpful? Give feedback.
All reactions