Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pktgen throws Segment Fault in various test-cases #265

Open
Payflow98 opened this issue Jun 4, 2024 · 26 comments
Open

Pktgen throws Segment Fault in various test-cases #265

Payflow98 opened this issue Jun 4, 2024 · 26 comments

Comments

@Payflow98
Copy link

Hello there,
I have a bit of trouble getting started with pktgen and hope someone can help me. I want to use pktgen to read and send traffic of a pcap file. While trying to run pktgen, firstly without a pcap, I always get following Error:
I try to simply start pktgen after freshly compiling with:
sudo ./path/to/pktgen

====== Pktgen got a Segment Fault

Obtained 7 stack frames.
./Pktgen-DPDK/builddir/app/pktgen(+0x25e83) [0x5857c4580e83]
/lib/x86_64-linux-gnu/libc.so.6(+0x42990) [0x746baa642990]
./Pktgen-DPDK/builddir/app/pktgen(+0x99a7) [0x5857c45649a7]
./Pktgen-DPDK/builddir/app/pktgen(+0xa793) [0x5857c4565793]
/lib/x86_64-linux-gnu/libc.so.6(+0x28150) [0x746baa628150]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x89) [0x746baa628209]
./Pktgen-DPDK/builddir/app/pktgen(+0xb025) [0x5857c4566025]

I get this error on multiple test-cases. One Testsetup is using an Intel i9 processor the other one has an AMD Ryzen 5 5600. I have tried ubuntu-22.04 and ubuntu-23.10, but I always get the same problem. Of course I tried to set multipile variables, just like the -c, -l, -m, -P etc. but still the same. I originally wanted to setup everything in EVE-NG, but virtualizing everything didnt seem to work. However, using ubuntu-22.04 I was able to compile and start pktgen Version 22.04.1 with dpdk 22.11.1. In this case I could start pktgen with the command mentioned above.

Since I want to use pktgen for reading and sending pcaps, so I tried running pktgen with:
-P -m "[1].0" -s 0:/pcap/imix.pcap

Unfortunately I get following error:

EAL: Error - exiting with code: 1
Cause: pktgen_pcap_open: rte_zmalloc_socket() failed for pcap_info_t structure

I really hope someone can help here.
Sincerely

@KeithWiles
Copy link
Collaborator

KeithWiles commented Jun 6, 2024

Pktgen requires DPDK arguments and Pktgen arguments, which I do not see in the first problem. Please send me the complete command line used.

For the second error it could be you are loading a very large pcap file. Pktgen has a limit to the size or number of packets in a pcap file, but it should not crash.

@tianywan
Copy link

My pktgen also throw a segment fault:
image

the full command is:
pktgen -l 2-10 -n 4 -b 01:00.0 -b 01:00.1 -b a4:00.0 --proc-type auto --socket-mem 8192 -- -P -m "[3-6:7-10].[0,1]" -f themes/black-yellow.theme

@KeithWiles
Copy link
Collaborator

It appears the command line has three NICs defined but only using two NICs. Is that what you wanted?
-b 01:00.0 -b 01:00.1 -b a4:00.0

@KeithWiles
Copy link
Collaborator

KeithWiles commented Jul 18, 2024

Also another test would be to split the two NICs to be handled by two lcore groups -m "[3-4:7-8].0" -m "[5-6:9-10].1".

One more item I so not see the pcap file option on the command line, does this happen without a pcap file?

Also remove the --socket-mem 8192 as it is not really needed much anymore.

Here is the command line I used for some testing and is generated by the ./tools/run.py script.
sudo -E ./usr/local/bin/pktgen -l 1,2-4,5-7 -n 4 --proc-type auto --log-level 7 --file-prefix pg -a 03:00.0 -a 03:00.1 -- -v -T -P -m [2:3-4].0 -m [5:6-7].1 -f themes/black-yellow.theme

BTW, did you build Pktgen to include Lua?

@ldnelson16
Copy link

Hi @KeithWiles, I also have this same problem. My full command line is: sudo -E ./usr/local/bin/pktgen -l 1,2,3 -n 4 -a 01:00.0 -a 02:00.0 --proc-type auto -- -P -m [2].0 -m [3].1 -f themes/black-yellow.theme -s 0:pcap/traffic_sample.pcap.

I have separated the core groups, and specified 1 core for pktgen, and one for each NIC. Lua is not enabled, does it need to be?

When inspecting the call to rte_zmalloc_socket, where this fails: all parameters look reasonable, what could the error be? Is it to do with memory allocation for dpdk?

@KeithWiles
Copy link
Collaborator

Yes, DPDK maybe trying to allocate a given amount of memory and failing.

Where was the rte_zmalloc_socket being called from?

@KeithWiles
Copy link
Collaborator

Lua does not need to be enabled.

@ldnelson16
Copy link

Yes, DPDK maybe trying to allocate a given amount of memory and failing.

Where was the rte_zmalloc_socket being called from?

It is called from pktgen_pcap_open in pktgen-pcap.c

@ldnelson16
Copy link

ldnelson16 commented Jul 18, 2024

My output for dpdk-hugepages.py -s is:
Node Pages Size Total
0 1024 2Mb 2Gb

Hugepages mounted on /dev/hugepages

@KeithWiles
Copy link
Collaborator

How man NUMA zones are on this machine?
Can print out the sid and `pid' values passed to rte_zmalloc_socket(). I have seen machine report invalid NUMA zone IDs or the NUMA zone does not contain any hugepages.

Also please give me the output from numastat -m thanks.

@ldnelson16
Copy link

ldnelson16 commented Jul 19, 2024

Here are all the values passed to rte_zmalloc_socket:
Name: PCAP-Info-0
sizeof(pcap_info_t): 88
RTE_CACHE_LINE_SIZE: 64
socket: 65535 (I assume this is socket ID / sid)
pid comes out to 0 (although is not passed into rte_zmalloc_socket, this seems odd

numstat -m:
Per-node system memory usage (in MBs):
Token SwapCached not in hash table.
Token FileHugePages not in hash table.
Token FilePmdMapped not in hash table.
Node 0 Total
--------------- ---------------
MemTotal 15810.06 15810.06
MemFree 529.93 529.93
MemUsed 15280.12 15280.12
Active 6798.08 6798.08
Inactive 5404.66 5404.66
Active(anon) 2.77 2.77
Inactive(anon) 2811.15 2811.15
Active(file) 6795.31 6795.31
Inactive(file) 2593.51 2593.51
Unevictable 27.20 27.20
Mlocked 27.11 27.11
Dirty 0.04 0.04
Writeback 0.00 0.00
FilePages 9399.30 9399.30
Mapped 362.32 362.32
AnonPages 2830.57 2830.57
Shmem 2.31 2.31
KernelStack 6.98 6.98
PageTables 16.59 16.59
NFS_Unstable 0.00 0.00
Bounce 0.00 0.00
WritebackTmp 0.00 0.00
Slab 886.16 886.16
SReclaimable 762.04 762.04
SUnreclaim 124.12 124.12
AnonHugePages 0.00 0.00
ShmemHugePages 0.00 0.00
ShmemPmdMapped 0.00 0.00
HugePages_Total 2048.00 2048.00
HugePages_Free 2046.00 2046.00
HugePages_Surp 0.00 0.00
KReclaimable 762.04 762.04

this is numastat -m, what am I supposed to be expecting here?

@KeithWiles
Copy link
Collaborator

I see the problem, you only have one Socket/NUMA zone and DPDK is returning -1 on the call to sid = rte_eth_dev_socket_id(pid); Normally this value needed to be 0-n. DPDK should have return zero, but it returns -1.

DPDK returning -1 is going to break any request to DPDK based on NUMA zones ID. For this specific call to rte_zmalloc_socket() we can test for -1 and set sid to 0.

This problem will most likely happen in other calls that use a socket. :-(

@ldnelson16
Copy link

ldnelson16 commented Jul 19, 2024

Is this a problem that is unavoidable due to only having one NUMA node, or is this a fixable error? I'm a little unclear
Is it not a problem that my generated pid is 0?

@KeithWiles
Copy link
Collaborator

In the pktgen-pcap.c file around line 130 change uint16_t sid; to int16_t sid;`

Then replace the line 135 with the following:

    if (((sid = rte_eth_dev_socket_id(pid)) == SOCKET_ID_ANY) && (rte_errno != 0))
        rte_exit(EXIT_FAILURE, "%s: rte_eth_dev_socket_id(%d) failed errno %d\n",
                 __func__, pid, rte_errno);
    else if (sid == SOCKET_ID_ANY)
        sid = 0;

I was not able to test this code.

@ldnelson16
Copy link

ldnelson16 commented Jul 19, 2024

Okay, it seems to pass the rte_zmalloc_socket call, but generates invalid memory errors down the line, which are related to other rte_zmalloc_socket calls like you mentioned

@KeithWiles
Copy link
Collaborator

Any place in the code where socket ID or NUMA ID is used will most likely have this problem. I guess to fix these I would need to replace all of the socket ID based DPDK calls with a routine verifying the socket ID returned is valid.

@KeithWiles
Copy link
Collaborator

Please give the branch fix-socket-crash on the pktgen repo a try and see if it gets you working. I did not try to fix all of the NUMA/Socket related problems and more issues may exist.

@ldnelson16
Copy link

I will do this ^^, however, where to add pg_zmalloc_socket. I also notice that in other parts, calling rte_eth_dev_socket_id does not return -1, but instead returns very large numbers, I have given the two I found in binary below:

100110101100100

1011111100110100110101100100

These aren't simple ones like just -1 being wrapped around, so I'm not sure why or how this happens. Based on the fact that I only have 1 NUMA node, it should always return sid 0, no? For this reason, the pg_zmalloc_socket does not fix, I am manually changing all instances of sid to 0 to see if that will correctly work.

@KeithWiles
Copy link
Collaborator

The call to get the socket_id should be returning SOCKET_ANY_ID, which is -1 the pg_zmalloc_socket() should detect this and return the correct memory. The other locations in the code that use socket ID will still return -1, in the case of a huge number it is possible the variable used is unsigned value. If the value is unsigned it needs to be signed.

@KeithWiles
Copy link
Collaborator

Not user why the values are strange values, but would need to take each case into account to find out.

Please post the locations you find other issues.

@KeithWiles
Copy link
Collaborator

Could be the calls in DPDK are not accounting for this in anyway when only one socket or no NUMA per say at all.

@ldnelson16
Copy link

The locations I noticed are at ... l2p_pktmbuf_create and parse_cores both in l2p.c

@ldnelson16
Copy link

I notice that after hard coding these to 0, it now does not find available ports, very confusing

@KeithWiles
Copy link
Collaborator

Please look at the new Pktgen release 24.10.0 and use the latest DPDK version as DPDK changed again, which caused compile problems.

I hope I fixed this issue

@samiachoueiri
Copy link

Hi,
I am facing a similar problem trying to replay a pcap file using pktgen 24.07.1 with DPDK 24.11.0-rc0 on Ubuntu 20.04.6 LTS. I am using the following command to run pktgen: sudo pktgen -l 0,1 -n 4 -a 07:00.0 -- -P -m "1.0" -s 0:icmp_packet.pcap
I get this error and my terminal is killed:

*** Copyright(c) <2010-2024>, Intel Corporation. All rights reserved.
*** Pktgen created by: Keith Wiles -- >>> Powered by <<<

EAL: Detected CPU lcores: 8
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Error - exiting with code: 1
Cause: pktgen_pcap_open: rte_zmalloc_socket() failed for pcap_info_t structure

@KeithWiles
Copy link
Collaborator

Please update to the latest Pktgen 24.10.0 and DPDK 24.11.0-rc1 I have updated dpdk.org repo, but please use the one located here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants