You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
after a collegue showed orbit to me I wanted to build it for rocky linux 9. I got it compiled and running (after a few quirks but that's a different story), but OrbitServer crashed. I did a debug build and looked at the core, it crashes her: #2 0x00007f46c2a28833 in abort () from /usr/lib64/libc.so.6 #3 0x0000559abbfc2231 in orbit_linux_tracing::TracerImpl::ProcessSampleEventAndReturnTimestamp (this=0x7f46b00896f0, header=..., ring_buffer=0x7f469800b110)
at /home/michael.teske/src/orbit-main/src/LinuxTracing/TracerImpl.cpp:1353 #4 0x0000559abbfbeda2 in orbit_linux_tracing::TracerImpl::ProcessOneRecord (this=0x7f46b00896f0, ring_buffer=0x7f469800b110) at /home/michael.teske/src/orbit-main/src/LinuxTracing/TracerImpl.cpp:967
...
1353 ORBIT_CHECK(header.size == sizeof(RingBufferRawSample));
(gdb) p header.size
$1 = 120
(gdb) p sizeof(RingBufferRawSample)
$2 = 112
So I looked into /sys/kernel/debug/tracing/events/sched/sched_switch/format and compared with my colleague's version and it looks like they added a byte
field:unsigned char common_preempt_lazy_count; offset:8;
resulting in this structure being 8 bytes longer (and probably others as well). Of course it does not match the structs in
KernelTracepoints.h anymore. I'll try to fix that, but I'm wondering if anybody else had the problem. My kernel reports as
5.14.0-503.15.1.el9_5.x86_64 .
I didn't find much information about the change, except https://lore.kernel.org/bpf/[email protected]/t/
Is this a redhat thing maybe?
EDIT: Yes, it's redhat. my collegue has Ubuntu with kernel 6.8.
The text was updated successfully, but these errors were encountered:
mteske
changed the title
OrbitService crashing on newer kernels due to changed layouts
OrbitService crashing on redhat kernels due to changed layouts
Dec 13, 2024
KernelTracepoints-diff.txt
In the meantime I got it to run with these diffs, where I found the correct padding with a bit of trial and error. This will break non-redhat systems of course, I'm not sure how to do a proper patch right now. Once the actual format files are used this should not be a problem anymore. In the meantime it might be useful for any rhel9-user (rhel8 does not have this change).
Hi @mteske , thanks for the report. This is indeed an issue with the current implementation and is already tracked here: #4857. Ideally we parse the tracepoint layout at runtime instead of hard-coding it.
Hi,
after a collegue showed orbit to me I wanted to build it for rocky linux 9. I got it compiled and running (after a few quirks but that's a different story), but OrbitServer crashed. I did a debug build and looked at the core, it crashes her:
#2 0x00007f46c2a28833 in abort () from /usr/lib64/libc.so.6
#3 0x0000559abbfc2231 in orbit_linux_tracing::TracerImpl::ProcessSampleEventAndReturnTimestamp (this=0x7f46b00896f0, header=..., ring_buffer=0x7f469800b110)
at /home/michael.teske/src/orbit-main/src/LinuxTracing/TracerImpl.cpp:1353
#4 0x0000559abbfbeda2 in orbit_linux_tracing::TracerImpl::ProcessOneRecord (this=0x7f46b00896f0, ring_buffer=0x7f469800b110) at /home/michael.teske/src/orbit-main/src/LinuxTracing/TracerImpl.cpp:967
...
1353 ORBIT_CHECK(header.size == sizeof(RingBufferRawSample));
(gdb) p header.size
$1 = 120
(gdb) p sizeof(RingBufferRawSample)
$2 = 112
So I looked into /sys/kernel/debug/tracing/events/sched/sched_switch/format and compared with my colleague's version and it looks like they added a byte
field:unsigned char common_preempt_lazy_count; offset:8;
resulting in this structure being 8 bytes longer (and probably others as well). Of course it does not match the structs in
KernelTracepoints.h anymore. I'll try to fix that, but I'm wondering if anybody else had the problem. My kernel reports as
5.14.0-503.15.1.el9_5.x86_64 .
I didn't find much information about the change, except
https://lore.kernel.org/bpf/[email protected]/t/
Is this a redhat thing maybe?
EDIT: Yes, it's redhat. my collegue has Ubuntu with kernel 6.8.
The text was updated successfully, but these errors were encountered: