Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Caladan work with disabled Hyper-Threading #22

Open
ntyunyayev opened this issue Jul 1, 2024 · 6 comments
Open

Making Caladan work with disabled Hyper-Threading #22

ntyunyayev opened this issue Jul 1, 2024 · 6 comments

Comments

@ntyunyayev
Copy link

Hi !

While trying to run the synthetic app, I encounter an issue which seems constrained to servers where Hyper-Threading is explicitly disabled in the BIOS. I get the following error while running the iokernel and the client :

Ubuntu 22.04/Kernel 5.15/CX5 NIC

The error :

EAL: Probe PCI driver: mlx5_pci (15b3:a2dc) device: 0000:51:00.0 (socket 0)
[ 0.372778] CPU 01| <6> init -> rx
[ 0.449482] CPU 01| <6> init -> tx
[ 0.452856] CPU 01| <6> init -> dp_clients
[ 0.452894] CPU 01| <6> init -> dpdk_late
[ 0.650827] CPU 01| <5> dpdk: driver: mlx5_pci port 0 MAC: 58 a2 e1 85 7a fa
[ 0.700955] CPU 01| <6> init -> directpath
[ 0.700966] CPU 01| <6> init -> hw_timestamp
[ 0.716591] CPU 01| <5> mlx5: device cycles / us: 1000.0000
[ 0.716609] CPU 01| <5> UINTR: disabled
[ 0.716614] CPU 01| <5> main: core 1 running dataplane. [Ctrl+C to quit]
[ 4.785119] CPU 01| <0> FATAL: ./inc/base/list.h:289 ASSERTION 'i != &h->n' FAILED IN 'list_del_from'

sudo ./iokerneld simple noht nicpci 0000:51:00.0

sudo ./apps/synthetic/target/release/synthetic 10.200.0.2:5000 --config client.config --mode runtime-client
host_addr 10.200.0.2 host_netmask 255.255.255.0 host_gateway 10.200.0.2 runtime_kthreads 2 runtime_spinning_kthreads 2 runtime_guaranteed_kthreads 2 runtime_priority lc

The error is located in the "sched_enable_kthread" function. For machines where Hyper-Threading is enabled, the same commands works without any issue. Am I missing something in the configuration ?

Best regards,

Nikita

@joshuafried
Copy link
Member

Hey Nikita -

Sorry you're running into this issue. Can you let me know what commit you are running on? Also, does the problem still happen if you use ias instead of simple when starting the iokernel?

@ntyunyayev
Copy link
Author

Thank you for your quick reply. I am running the last commit. Using ias causes the same issue, unfortunately.

@joshuafried
Copy link
Member

joshuafried commented Jul 2, 2024 via email

@ntyunyayev
Copy link
Author

Is something like this enough ?

#0 logk_bug (fatal=true, expr=0x555555b830c2 "i != &h->n", file=0x555555b830b0 "./inc/base/list.h", line=289,
func=0x555555b83298 <func.14> "list_del_from") at base/log.c:63
No locals.
#1 0x00005555556ce44c in list_del_from (h=0x7fffb4000d20, n=0x7fffb4000d98) at ./inc/base/list.h:289
i = 0x7fffb4000d20
func = "list_del_from"
#2 0x00005555556cf21d in sched_enable_kthread (p=0x7fffb4000cd0, th=0x7fffb4000d30, core=3) at iokernel/sched.c:154
No locals.
#3 0x00005555556cf76e in sched_run_on_core (p=0x7fffb4000cd0, core=3) at iokernel/sched.c:260
s = 0x555555d5a5c0 <state+96>
th = 0x7fffb4000d30
func = "sched_run_on_core"
#4 0x00005555556d1f08 in simple_run_kthread_on_core (p=0x7fffb4000cd0, core=3) at iokernel/simple.c:134
sd = 0x5555563f6cf0
ret = 32767
#5 0x00005555556d239b in simple_add_kthread (p=0x7fffb4000cd0) at iokernel/simple.c:218
sd = 0x5555563f6cf0
core = 3
#6 0x00005555556d24af in simple_notify_congested (p=0x7fffb4000cd0, delay=0x7fffffffe140) at iokernel/simple.c:254
sd = 0x5555563f6cf0
ret = 1065353216
congested = true
#7 0x00005555556d09bc in sched_measure_delay (p=0x7fffb4000cd0) at iokernel/sched.c:683
dl = {has_work = true, parked_thread_busy = false, standing_queue = true, max_delay_us = 1314.7986661108796,
min_delay_us = 1314.7986661108796, avg_delay_us = 1314.7986661108796, min_delay_core = 2}
th = 0x7fffb4000e78
rxq_delay = 0
consumed_strides = 0
posted_strides = 93824993799694
--Type for more, q to quit, c to continue without paging--
next_poll_tsc = 18446744073709551615
i = 2
directpath_armed = true
#8 0x00005555556d0d39 in sched_poll () at iokernel/sched.c:778
last_time = 6076508504604949
idle = {0, 0, 0, 0}
s = 0x7fffffffe1f0
now = 2521905
i = 21845
core = 21845
idle_cnt = 0
p = 0x7fffb4000cd0
p_next = 0x555555d2c0a0 <numa_ops>
func = "sched_poll"
#9 0x00005555556bb85b in dataplane_loop () at iokernel/main.c:147
work_done = false
#10 0x00005555556bc2b0 in main (argc=5, argv=0x7fffffffe4f8) at iokernel/main.c:310
i = 5
ret = 0
utsname = {sysname = "Linux", '\000' <repeats 59 times>, nodename = "atchoum", '\000' <repeats 57 times>,
release = "5.15.0-112-generic", '\000' <repeats 46 times>,
version = "#122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024", '\000' <repeats 20 times>,
machine = "x86_64", '\000' <repeats 58 times>, domainname = "(none)", '\000' <repeats 58 times>}

@joshuafried
Copy link
Member

Thanks! Two more requests - (1) can you also try the ias scheduler instead of simple and (2) can you include the full output of the iokernel?

@ntyunyayev
Copy link
Author

Here you are :

CPU 09| <6> entering 'iokernel' init phase
CPU 09| <6> init -> base
CPU 09| <5> thread: created thread 0
CPU 09| <5> cpu: detected 16 cores, 1 nodes
CPU 09| <5> time: detected 2399 ticks / us
[ 0.000940] CPU 09| <6> init -> ksched
[ 0.000993] CPU 09| <6> init -> sched
[ 0.001010] CPU 09| <5> sched: CPU configuration...
node 0: [0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
[ 0.001030] CPU 09| <5> sched: dataplane on 1, control on 0
[ 0.001039] CPU 09| <6> init -> simple
[ 0.001047] CPU 09| <6> init -> numa
[ 0.001052] CPU 09| <6> init -> ias

===== Processor information =====
Linux arch_perfmon flag : yes
Hybrid processor : no
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Max CPUID level : 27
CPU model number : 106
IBRS enabled in the kernel : no
STIBP enabled in the kernel : no
The processor is not susceptible to Rogue Data Cache Load: yes
The processor supports enhanced IBRS : yes
[New Thread 0x7ffff4952400 (LWP 1943956)]
[New Thread 0x7ffff4151400 (LWP 1943957)]
Socket 0: 4 memory controllers detected with total number of 8 channels. 0 UPI ports detected. 4 M2M (mesh to memory) blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 0 M3UPI blocks detected.
Initializing RMIDs
[New Thread 0x7ffff3950400 (LWP 1943958)]
[New Thread 0x7ffff314f400 (LWP 1943959)]
[New Thread 0x7ffff294e400 (LWP 1943960)]
[New Thread 0x7ffff214d400 (LWP 1943961)]
[New Thread 0x7ffff194c400 (LWP 1943962)]
[New Thread 0x7ffff114b400 (LWP 1943963)]
[New Thread 0x7ffff094a400 (LWP 1943964)]
[New Thread 0x7ffff0149400 (LWP 1943965)]
[New Thread 0x7fffef948400 (LWP 1943966)]
[New Thread 0x7fffef147400 (LWP 1943967)]
[New Thread 0x7fffee946400 (LWP 1943968)]
[New Thread 0x7fffee145400 (LWP 1943969)]
[New Thread 0x7fffed944400 (LWP 1943970)]
[New Thread 0x7fffed143400 (LWP 1943971)]
[New Thread 0x7fffec942400 (LWP 1943972)]
[New Thread 0x7fffec141400 (LWP 1943973)]
[New Thread 0x7fffeb940400 (LWP 1943974)]
[New Thread 0x7fffeb13f400 (LWP 1943975)]
[New Thread 0x7fffea93e400 (LWP 1943976)]
[New Thread 0x7fffea13d400 (LWP 1943977)]
[New Thread 0x7fffe993c400 (LWP 1943978)]
[New Thread 0x7fffe913b400 (LWP 1943979)]
[New Thread 0x7fffe893a400 (LWP 1943980)]
[New Thread 0x7fffe8139400 (LWP 1943981)]
[New Thread 0x7fffe7938400 (LWP 1943982)]
[New Thread 0x7fffe7137400 (LWP 1943983)]
[New Thread 0x7fffe6936400 (LWP 1943984)]
[New Thread 0x7fffe6135400 (LWP 1943985)]
[New Thread 0x7fffe5934400 (LWP 1943986)]
[New Thread 0x7fffe5133400 (LWP 1943987)]
[New Thread 0x7fffe4932400 (LWP 1943988)]
[New Thread 0x7fffe4131400 (LWP 1943989)]
[ 0.069669] CPU 00| <6> init -> proc_timer
[ 0.069701] CPU 00| <6> init -> control
[ 0.145255] CPU 00| <5> control: spawning control thread
[New Thread 0x7fffc37ff400 (LWP 1943990)]
[ 0.145426] CPU 00| <6> init -> dpdk
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
[New Thread 0x7fffc2ffe400 (LWP 1943991)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[New Thread 0x7fffc27fd400 (LWP 1943992)]
EAL: Selected IOVA mode 'PA'
EAL: 2781 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:a2dc) device: 0000:51:00.0 (socket 0)
[New Thread 0x7fffc1ffc400 (LWP 1943993)]
[New Thread 0x7fffc17fb400 (LWP 1943994)]
[ 0.395712] CPU 01| <6> init -> rx
[ 0.470769] CPU 01| <6> init -> tx
[ 0.474051] CPU 01| <6> init -> dp_clients
[ 0.474085] CPU 01| <6> init -> dpdk_late
[ 0.674338] CPU 01| <5> dpdk: driver: mlx5_pci port 0 MAC: 58 a2 e1 85 7a fa
[ 0.725635] CPU 01| <6> init -> directpath
[ 0.725646] CPU 01| <6> init -> hw_timestamp
[ 0.742864] CPU 01| <5> mlx5: device cycles / us: 1000.0000
[ 0.742883] CPU 01| <5> UINTR: disabled
[ 0.742887] CPU 01| <5> main: core 1 running dataplane. [Ctrl+C to quit]

Thread 1 "iokerneld" hit Breakpoint 1, logk_bug (fatal=true, expr=0x555555b830c2 "i != &h->n", file=0x555555b830b0 "./inc/base/list.h", line=289, func=0x555555b83298 <func.14> "list_del_from") at base/log.c:63
63 logk(LOG_EMERG, "%s: %s:%d ASSERTION '%s' FAILED IN '%s'",
(gdb) backtrace -full
#0 logk_bug (fatal=true, expr=0x555555b830c2 "i != &h->n", file=0x555555b830b0 "./inc/base/list.h",
line=289, func=0x555555b83298 <func.14> "list_del_from") at base/log.c:63
No locals.
#1 0x00005555556ce44c in list_del_from (h=0x7fffb4000d20, n=0x7fffb4000d98) at ./inc/base/list.h:289
i = 0x7fffb4000d20
func = "list_del_from"
#2 0x00005555556cf21d in sched_enable_kthread (p=0x7fffb4000cd0, th=0x7fffb4000d30, core=3)
at iokernel/sched.c:154
No locals.
#3 0x00005555556cf76e in sched_run_on_core (p=0x7fffb4000cd0, core=3) at iokernel/sched.c:260
s = 0x555555d5a5c0 <state+96>
th = 0x7fffb4000d30
func = "sched_run_on_core"
#4 0x00005555556b9b8e in ias_run_kthread_on_core (sd=0x5555564b71c0, core=3) at iokernel/ias.c:199
ret = 0
#5 0x00005555556ba3ca in ias_add_kthread (sd=0x5555564b71c0) at iokernel/ias.c:413
core = 3
#6 0x00005555556ba5f5 in ias_notify_congested (p=0x7fffb4000cd0, delay=0x7fffffffe150)
at iokernel/ias.c:474
sd = 0x5555564b71c0
ret = 1065353216
congested = true
#7 0x00005555556d09bc in sched_measure_delay (p=0x7fffb4000cd0) at iokernel/sched.c:683
dl = {has_work = true, parked_thread_busy = false, standing_queue = true,
max_delay_us = 1383.6506877865777, min_delay_us = 1383.6506877865777,
avg_delay_us = 1383.6506877865777, min_delay_core = 2}
th = 0x7fffb4000e78
rxq_delay = 0
consumed_strides = 0
posted_strides = 93824993799694
next_poll_tsc = 18446744073709551615
i = 2
directpath_armed = true
#8 0x00005555556d0d39 in sched_poll () at iokernel/sched.c:778
last_time = 6177825002624055
idle = {0, 0, 0, 0}
s = 0x7fffffffe200
--Type for more, q to quit, c to continue without paging--
now = 16402518
i = 21845
core = 21845
idle_cnt = 0
p = 0x7fffb4000cd0
p_next = 0x555555d2c0a0 <numa_ops>
func = "sched_poll"
#9 0x00005555556bb85b in dataplane_loop () at iokernel/main.c:147
work_done = false
#10 0x00005555556bc2b0 in main (argc=5, argv=0x7fffffffe508) at iokernel/main.c:310
i = 5
ret = 0
utsname = {sysname = "Linux", '\000' <repeats 59 times>,
nodename = "atchoum", '\000' <repeats 57 times>,
release = "5.15.0-112-generic", '\000' <repeats 46 times>,
version = "#122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024", '\000' <repeats 20 times>,
machine = "x86_64", '\000' <repeats 58 times>,
domainname = "(none)", '\000' <repeats 58 times>}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants